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PREFACE 



The National Postsecondary Education Cooperative (NPEC) was authorized by Congress in 1994. 
It charged the National Center for Education Statistics to establish a national postsecondary cooperative 
to promote comparable and uniform information and data at the federal, state, and institutional levels. In 
accordance with this charge, the projects supported by the Cooperative do not necessarily represent a 
federal interest, but may represent a state or institutional interest. Such is the case with this Sourcebook. 
While there is no federal mandate to assess the cognitive outcomes of postsecondary education, some 
states and many institutions have identified cognitive assessment as a way of examining the outcomes of 
their educational programs. This project was undertaken to facilitate these efforts. 

In a climate of accelerating costs and greater requirements for high-quality services, policymakers 
are attempting to understand the value of higher education and are demanding greater accountability from 
institutions. Concurrently, accreditation agencies are requiring assessment of student outcomes as an 
integral part of the accreditation process. Increasingly, colleges and universities are being asked for more 
direct measures of student outcomes. How much did students learn? Did they learn the “right things”? 
Did they complete college prepared for employment? And postsecondary education is increasingly asking 
itself: What information really answers these questions? How do we measure what was learned? Can 
institutions that have different missions or that deliver instruction using different learning modes respond 
in a comparable way? 

The National Postsecondary Education Cooperative (NPEC), in its first council meeting (held in 
the fall of 1995), identified the assessment of student outcomes as a high priority. The NPEC Steering 
Committee appointed two working groups, Student Outcomes from a Policy Perspective and Student 
Outcomes from a Data Perspective, to explore the nature of data on student outcomes and their usefulness 
in policymaking. The exploratory framework developed by the policy working group is presented in the 
paper Student Outcomes Information for Policy-Making (Terenzini 1997) (see 
http://nces.ed.gov/pubs97/9799 1 .pdf ). Recommendations for changes to current data collection, analysis, 
and reporting on student outcomes are included in the paper Enhancing the Quality and Use of Student 
Outcomes Data (Gray and Grace 1997) (see http://nces.ed.gov/pubs97/97992.pdf ). Based on the work 
undertaken for these reports, both working groups endorsed a pilot study of th*e Terenzini framework and 
future research on outcomes data and methodological problems. 

In 1997, a new working group was formed to review the framework proposed by Terenzini vis-a- 
vis existing measures for selected student outcomes. The working group divided into two subgroups. One 
group focused on cognitive outcomes, and the other concentrated on preparation for employment 
outcomes. The cognitive outcomes group produced two products authored by T. Dary Erwin, a consultant 
to the working group: The NPEC Sourcebook on Assessment , Volume 1: Definitions and Assessment 
Methods for Critical Thinking , Problem Solving , and Writing ; and The NPEC Sourcebook on Assessment , 
Volume 2: Selected Institutions Utilizing Assessment Results. Both publications can be viewed on the 
NPEC Web site at http://nces.ed.gov/npec/ under “Products.” 

The NPEC Sourcebook on Assessment, Volume I: Definitions and Assessment Methods for Critical 
Thinking , Problem Solving , and Writing is a compendium of information about tests used to assess the 
three skills. Volume 1 is a tool for people who are seeking comparative data about the policy-relevance of 
specific student outcomes measured in these areas. The interactive version of Volume 1 (see 
http://nces.ed.gov/npec/evaltests/ ) allows users to specify their area(s) of interest and create a customized 
search of assessment measures within the three domain areas. 
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Volume 1 should be regarded as a work in progress and has certain limitations. First, it focuses on 
three kinds of student outcomes: critical thinking, problem solving, and writing. The Student Outcomes 
Working Group recognizes that there are many more outcome variables and measures that are of interest 
to postsecondary education constituents. Second, Volume 1 describes tests that are designed, for the most 
part, to measure cognitive variables for traditional students. It does not describe more “nontraditional” 
methods such as portfolios and competencies. Similarly, the tests themselves are not assessed with 
nontraditional settings in mind. Finally, the evaluations of the tests found in this volume are based mainly 
on the way the developers of the tests represent them in their materials and, in some cases, on material 
available through third-party test reviews. Each prospective user of any of the tests must evaluate the 
test’s appropriateness for the user’s own particular circumstances. Different needs, motivations, and 
focuses affect the utilization of the various assessments. 

The tests described in Volume 1 are those that the consultant to the group was able to identify 
through careful searching and consideration. Some tests may have been inadvertently missed. Also, the 
comments in the book are not to be taken as a recommendation or condemnation of any test, but rather as 
a description. The descriptive process used is unique to NPEC and was developed for the purpose of the 
Student Outcomes Working Group project. We intend to update this volume on an as needed basis. 
Updates will be available at the NPEC web site: http://nces.ed.gov/npec/evaltests/. 

The NPEC Sourcebook on Assessment , Volume 7 is a companion volume to The NPEC Sourcebook 
on Assessment , Volume 2. Volume 2 provides eight case studies of institutions that have addressed/policy- 
related issues through the use of the assessment methods presented in Volume 1. 

Your comments on Volume 1 are always welcome. We are particularly interested in your 
suggestions concerning student outcomes variables and measures, potentially useful products, and other 
projects that might be appropriately linked with future NPEC student outcomes efforts. Please e-mail your 
suggestions to Nancy Borkow ( Nancy Borkow@ed.gov) . the NPEC Project Director at the National 
Center for Education Statistics. 



Toni Larson, Chair 

NPEC Student Outcomes Pilot Working Group: 
Cognitive and Intellectual Development 
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1. GENERAL AND SPECIFIC ISSUES IN SELECTING ASSESSMENTS 



1.1 Introduction 

The educational goals for the year 2000, announced by the President of the United States and 
state governors in 1990, included the abilities to think critically, solve problems, and communicate. In a 
national response to the educational goals, a list of communication and critical thinking skills was 
obtained from a study of 500 faculty, employers, and policymakers who were asked to identify the skills 
that these groups believe college graduates should achieve (Jones et al. 1995). To address these national 
concerns, there is a need to provide evidence of attainment of these essential skills in general education. 
Providing the assessment results of general education gives proof of “return” to policymakers, as general 
education assessment enables collection of all students’ performance, regardless of individual major. A 
variety of assessment methods have been developed to measure attainment of these skills. This report will 
present definitions of critical thinking, problem solving, and writing, along with a detailed review of 
assessment methods currently available. 

In addition to specific information pertaining to critical thinking, problem solving, and 
writing, there are general issues pertaining to the assessment of these skills. Definitions of the particular 
conceptual and methodological criteria that play a key role in evaluating and selecting assessments for use 
in higher education are outlined in the first section. More specifically, issues to be examined ih this 
section include the following: relevance to policy issues, utility for guiding specified policy objectives, 
applicability to multiple stakeholder groups, interpretability, credibility, fairness, scope of the data 
generated, availability or accessibility for specified/diversified purposes, measurability considerations, 
and cost. In the second section, the test format (multiple-choice vs. performance-based), which impacts 
the type of data generated and the resultant inferences that are justified, will be reviewed. The last section 
gives a detailed description of methodological concerns, such as reliability, validity, and method design. 
Because of the many factors to consider when undertaking a testing project, an assessment specialist who 
can create a comprehensive testing plan that accounts for conceptual and methodological issues as well as 
other factors relevant to the outcomes should be consulted. Due to the limitations in length of this report, 
only conceptual and methodological considerations will be discussed, but readers should take note that 
there are variables not explained in this report that greatly impact test selection (i.e., student motivation, 
the sample chosen, or the assessment design). 



1.2 Selection of Assessment Methods: Specific and General Considerations 

With the development of critical thinking, problem solving, and writing skills being 
increasingly recognized as integral goals of undergraduate education, a number of different measures 
have been designed across the country. Selection of an appropriate instrument or strategy for evaluating 
students’ competencies in these areas often depends on whether the assessment is formative or summative 
in nature. In formative evaluation the goal is to provide feedback, with the aim of improving teaching, 
learning, and the curricula; to identify individual students’ academic strengths and weaknesses; or to 
assist institutions with appropriate placement of individual students based on their particular learning 
needs. Summative evaluation, on the other hand, tends to be used to make decisions regarding allocation 
of funds and to aid in decisionmaking at the program level (e.g., personnel, certification, etc.). Data are 
derived from a summative assessment chiefly for accountability purposes and can therefore be used to 
meet the demands of accrediting bodies, and state and federal agencies. 

Once an institution identifies the specific purpose of its assessment and defines the particular 
critical thinking, problem solving, or writing skills it is interested in measuring, selection of the 
appropriate test becomes much easier. In some cases, there is not a measure that adequately examines the 
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forms of student achievement that have been the focus of curriculum objectives, producing a need to 
develop a test locally. When the type of assessment falls into the formative category, often only outcome 
data derived from locally developed tests provide enough congruence with the learning objectives and 
curriculum aims, in addition to yielding a sufficient quantity of information, to guide decisionmaking. 
This is certainly not always the case, and oftentimes an institution will find a commercially produced test 
that samples content and/or skill areas that were emphasized in their programs in addition to providing 
detailed student reports. When an assessment is conducted for external purposes, typically the widely 
recognized, commercially produced assessments are preferred. Unfortunately, if measures are selected for 
this reason only, institutions may end up with a measure that is not valid for use with their unique student 
population or particular programs. For example, an innovative general education program that emphasizes 
the development of critical thinking in the context of writing instruction might focus on students learning 
to write essays reflecting substantial critical thinking and integration of ideas. If the students are tested 
with a multiple-choice writing assessment, emphasizing mechanics and editing, the degree to which the 
program has met its objectives would not be legitimately measured. 

Conceptual Considerations 



Regardless of the specific objectives associated with a given assessment approach, a number 
of conceptual considerations should enter into the decision to use a particular measure. First, if the 
outcome data will be used for making a decision regarding an important policy issue, how relevant is the 
outcome to the particular issue at hand? For example, if an assessment is conducted to determine those 
writing skills needed for college graduates to function effectively in the business world, the context of an 
essay test should probably include products such as writing letters and formal reports rather than 
completing a literary analysis of a poem. 

A second critical conceptual issue relates to utility, or the potential of data generated from a 
particular measure to guide action directed toward achieving a policy objective. For instance, a policy 
objective might involve provision of resources based on institutions’ sensitivity to the learning needs of 
students from demographically diverse backgrounds. It would be difficult to convince funding agencies 
that students’ individual needs are being diagnosed and addressed with a measure that is culturally biased 
in favor of white middle-class students. Ewell and Jones (1993) noted that indirect measures often help 
individual colleges and universities improve instruction, but such measures tend to be less effective in 
terms of providing a clear focus of energy for mobilizing public support for national improvement. They 
base this judgment on the fact that data originating from many different types of institutions cannot be 
usefully combined into a single summary statistic without substantial distortion and loss of validity. 

Sell (1989) has offered several suggestions for enhancing the utilization of assessment 
information. These include the following: (1) attending to institutional characteristics and readiness to 
change in the design and implementation of assessment strategies; (2) ensuring the data are valid, reliable, 
and credible; (3) providing information in a concise and timely manner; (4) involving potential audiences 
(users) in the process; and (5) providing extensive feedback and consultation regarding recommended 
changes. 



Applicability of assessment measures relates to the extent to which information on a 
particular outcome measure meets the needs of multiple stakeholder groups. In other words, to what 
extent will data generated from a critical thinking, problem solving, or writing assessment yield 
information that can be used by multiple groups, such as faculty and administrators who wish to improve 
programs, or government officials and prospective employers who desire documentation of skill level 
achievement or attainment? 

A fourth critical conceptual issue pertains to the interpretability of the test information. 
Will the outcome data be provided in a format that is comprehensible to individuals with different 
backgrounds? Data generated must be readily consumable, or individuals trained to interpret outcome 
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data need to be available to translate score data into a form that can be readily understood by 
decisionmakers who will use the data. 

Credibility, which refers to how believable the information generated by a particular 
outcome is for policymakers, represents a fifth dimension of outcomes that should be incorporated into 
the selection process. Credibility is a multidimensional quality, with some overlap with the other 
dimensions. Credibility is established based on the amount of time, energy, and expertise that goes into a 
particular measure; the psychometric qualities associated with a test; the ease of interpretation of the 
materials and results; the amount of detail provided pertaining to student outcomes; and the cultural 
fairness of the test. Moreover, the credibility of outcome data is perhaps most closely tied to the degree to 
which the assessment information is conceptually related to the actual skills deemed important. 
Credibility, hence, is a part of validity, in that the validation process involves justifying or supporting the 
types of inferences drawn from data, which includes issues of fairness, the evaluation of psychometric 
properties of a test, and most importantly the interpretation of information (Messick 1981). Information 
pertaining to credibility will often be found through validation of test results (i.e., how congruent is test 
performance to the identified skills). Generally speaking, the results obtained with direct assessments 
have become more accepted as credible measures of learning to think critically, solve problems, and write 
effectively than nonperformance-based assessments, such as reports of student satisfaction or descriptions 
of student academic activities. 

Although cultural fairness is an important element in the overall credibility of a measure, it 
also constitutes a primary conceptual consideration. The information yielded by a particular assessment 
approach should not be biased or misleading in favor of particular groups. Bias can be subtle, requiring 
extensive analysis of item content and analysis of performance by students with comparable abilities, who 
differ only in terms of group association, to ensure fairness. A measurement analysis, Differential Item 
Functioning (DIF), allows for the control of ability level so that bias can be detected. In this way, cultural 
fairness is a measurement issue. 



Methodological Considerations 



In addition to the preceding conceptual considerations, several methodological criteria 
should be examined when critical thinking, problem solving, and writing assessments are selected. First, 
the scope of the data needed should be considered. If “census-type” data drawn from all students in 
attendance at all institutions in a particular locale are needed, then researchers should opt for measures 
that can be efficiently administered and scored in addition to measures that assess skills and content 
which are universally covered across curricula. However, if the scope of data needed is more restricted (of 
the “knowledge-base” type), with examinees selected via sampling strategies requiring fewer participants 
(perhaps drawn from particular institutions or regions), then measures designed to assess more highly 
specified curriculum-based skills can be used. Moss (1994) noted that there tends to be an inverse 
relationship between the number of students that can be tested and the complexity, depth, and breadth of 
outcome information that can be provided due to budgetary considerations. For the purposes of 
accountability, it is not necessary to assess every student to derive valid estimates of system performance, 
and a much wider range of outcome data can be generated when careful sampling is conducted. 

Availability of appropriate outcome measures represents a second methodological 
consideration. This refers to issues revolving around the availability of existing measures, the feasibility 
of developing new measures, and the logistics of using specified measures (both of the commercially 
available and locally developed variety). For instance, do the facilities and personnel exist for analysis 
and storage of data? Can the data be readily collected and the results disseminated without too much 
difficulty? Are the competencies and abilities of the individuals involved consistent with the tasks 
involved? Is the selected measurement strategy feasible with existing funds? How does the cost of one 
outcome measure compare to the cost of another? 



O 

ERIC 



3 



17 



Measurability refers to how the outcome is operationally defined and measured, including 
the methodological soundness of the chosen measures. A number of different approaches to assessing the 
constructs of critical thinking, problem solving, and writing ability are available in the literature; 
however, individuals involved in any particular assessment must arrive at a definition that is specific 
enough to be translated into definitive assessment objectives. In addition to construct definitions, 
reliability and validity of an assessment instrument must be carefully scrutinized to match the appropriate 
assessment test with the test givers’ objectives. There is a critical validity issue with particular relevance 
to direct measures of ability. Although direct assessments may possess high content validity, it is 
important that they are not considered “exempt from the need to marshal evidence in support of their use” 
(Powers, Fowles, and Willard 1994). For example, it is essential to establish a clear link between 
performance on a particular direct writing assessment and demonstrated writing on both concurrent (such 
as grades in a writing class) and future performances (demonstrating competence in graduate courses 
requiring writing or on-the-job writing tasks). Although the inferential leaps between authentic measures 
of abilities and actual tasks encountered in coursework or elsewhere are substantially reduced when direct 
measures are used, the need to provide validation of a test for a particular use remains the same (Powers, 
Fowles, and Willard 1994). 



Multiple-Choice Measures 



Assessment of critical thinking, problem solving, and writing in higher education has 
traditionally taken two forms: direct (constructed response) and indirect (multiple-choice) measurement. 
Indirect assessments involve an estimate of the examinee’s probable skill level based on observations of 
knowledge about skill level (i.e., to assess writing, one would observe vocabulary, grammar, sentence 
structure, etc.). Indirect assessments are exemplified by many of the standardized, commercially available 
tests. Perhaps the most frequently cited advantage of multiple-choice tests is the high reliability estimates 
often associated with them. Indirect assessments also tend to possess higher predictive validity with a 
variety of outcome measures, such as college GPA or scores on other standardized tests. An additional 
advantage is ease of scoring. Scoring is less time consuming and costly because computers can be readily 
used. Enhanced political leverage associated with outcomes derived from indirect assessments due to the 
extensive development process and general familiarity associated with commercially designed tests 
represent two other benefits. 

One of the commonly cited disadvantages of indirect assessment involves the time and 
resources needed to develop and revise the tests. Further, many have argued that indirect assessments 
dramatically under-represent the construct. For instance, when writing or critical thinking is defined as a 
process, multiple-choice tests do not adequately represent the definition. Inferences about the processes 
students use to arrive at the correct choice on a multiple-choice test are often made, but scrutinized for 
their accuracy. Ewell and Jones (1993) point out that conclusions drawn from indirect indicators are 
highly inferential even when the data are presented from multiple measures. White (1993) contends that 
many indirect assessments fail to assess higher-order thinking skills. Finally, allegations of bias based on 
gender, race, and language have been leveled against specific multiple-choice tests, and there is some 
evidence suggesting that the selected response format may generally favor certain groups more than the 
constructed format or essay-type test (Koenig and Mitchell 1988; White and Thomas 1981). However, 
general conclusions such as this should be viewed very cautiously, as the majority of available critical 
thinking, problem solving, and writing assessments have not been systematically examined for evidence 
of bias. 



Essay Tests 



Direct assessments involve evaluation of a sample of an examinee’s skill obtained under 
controlled or real life conditions by one or more judges, and are most frequently associated with the timed 
essay format. The specific types of essay assessments may be classified in terms of the types of tasks 
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employed and/or the scoring method implemented. Breland (1983) identified nine different types of tasks 
employed in direct measures of writing. Each of these will be described briefly. An examinee may be 
directed to write a letter to a friend, a potential employer, a politician, or an editor. Another type of essay 
prompt, termed a narrative, requires the student to write a personal account of an experience or convey 
the details of a particular story or historical event. Narratives can be real or imaginary. The descriptive 
format requires that the writer describe an object, place, or person, with the goal of creating a vivid image 
or impression in the reader’s mind. An argumentative prompt (also referred to as a persuasive task) 
instructs the examinee to adopt a position on an issue and present a persuasive argument in favor of the 
chosen side using relevant information obtained through personal experience and/or reading. For an 
expressive task, the examinee simply conveys his or her own personal opinion on a particular issue or 
event. With a role-playing prompt, the student is asked to assume a role in some situation and write a 
response to a given situation. A precis or abstract requires a summary or synthesis of a large body of 
information. The purpose of a diary entry is personal usage necessitating an informal tone, and finally, a 
literary analysis requires interpretation of a passage or other literary work. 

Several benefits of essay tests in general have been touted, including the following: (1) 
enhanced construct validity; (2) reduced racial bias; (3) faculty involvement in development and scoring, 
leading to more awareness of the central role of critical thinking, problem solving, and writing in the 
college curriculum; and (4) the flexibility to assess a wider range of skills than is feasible with the 
multiple-choice format. Although essay tests have earned increasing support from faculty, administrators, 
and test development experts in recent years, many professionals who are committed to the process model 
of writing object strongly to the timed essay as it precludes revision. Many adherents of a process 
definition of writing believe that revision represents the most critical part of the process, an<F when 
revision skills are not measured, an essential component of the construct is neglected. A disadvantage of 
critical thinking essay tests is that the ability to write is often entangled with the measurement of critical 
thinking ability. Essay tests have also been criticized because they are routinely conducted in artificial 
settings, provide only a small sample of the universe of writing, and have compromised reliability. 

Although this report will focus on specific assessment instruments and measurement issues 
surrounding each test, there will be no discussion of implementation issues at the state or university level. 
This information, although beyond the scope of this report, is still pivotal in selecting an assessment test. 
For instance, sample size, time of testing, the audience, and assessment design (pre/post-testing) are just a 
few examples of variables that greatly affect assessment outcomes. Such factors and many others should 
be reviewed with an assessment specialist before a measure is chosen. In addition to implementation 
issues, there are methodological and conceptual considerations that should steer the test selection process. 
Many of the considerations overlap, as in the cases of credibility and validity or cultural fairness and 
measurability. Therefore, the methodological and conceptual considerations are not independent issues, 
but parts of a whole that create a comprehensive and rigorous test selection process. 



1.3 Test Properties 



One of the methodological considerations in test selection involves the psychometric 
properties of a test. The test tables or templates provide a condensed review of studies that address the 
psychometric qualities of critical thinking, problem solving, and writing tests. The first column indicates 
the test name, author(s), publisher, date of publication, testing time, and cost. Any special comments or 
notes about the tests are at the bottom of this column. The second column gives the name(s) of the 
reported scores. Often tests have a total score and then several subtest scores. Whether or not subtest 
scores can be reported independently varies from test to test. The Definition column includes critical 
thinking, problem solving, or writing as defined by the author. It is important to note that the test items 
should match the definition given by the author(s). The next column. Reliability, involves the consistency 
of scores across a test. The statistics reported under this column will be addressed further in the report. 
Method Design combines both reliability and validity issues concerning the internal structure of a test. 
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Next is the Validity column, which gives information about studies that have implemented the tests. 
Readers should especially take note of studies conducted independently of test authors. The last column, 
Correlation with Other Measures, is a form of validity, and is given a separate section, due to the amount 
of information found for most tests. A review of correlations can be found under the heading. Validity. 
The following section is meant as a brief review of statistical procedures. For a more extensive 
explanation of reliability, validity, correlations, and method design issues, see Crocker and Algina (1986), 
Felt and Brennan (1989), or Cole and Moss (1989). 



Reliability 



Reliability is an estimate of test takers’ performance consistency internally, across time, test 
forms, and raters (when applicable). Tests are not reliable in and of themselves, but the scores generated 
from the tests can be reliable. This means that across varying populations, reliability estimates may 
change. Important factors to consider when interpreting reliability estimates are the following: longer 
tests tend to be more reliable, reliability fluctuates with test takers, speeded tests can change the reliability 
estimate, homogeneity of test taker ability lowers the reliability, different levels of skill may be measured 
with different levels of accuracy, and longer time intervals for test-retest reliability lower the reliability 
estimate. With these factors in mind, different types of reliability estimates will be reviewed. Generally, 
reliability estimates above .70 indicate an acceptable level, although values in the .80 and above are more 
commonly accepted reliabilities. 

Internal consistency can be measured using several methods. Coefficient Alpha, Sfplit-half, 
KR-20, and inter-rater reliability are the four methods reported in the context of the test reviews. Internal 
consistency is another term for a test of item homogeneity. Item homogeneity indicates that content and 
item quality are consistent throughout the test. This reliability coefficient ranges from 0 to 1.0, 
representing the degree of relationship among items on a test. A test with homogenous or more related 
items will produce higher reliability coefficients (values closer to 1.0). 

The most often used estimate of internal consistency is Alpha, indicated as “internal 
consistency” on the templates. For instance, the California Critical Thinking Dispositions Inventory 
(Facione and Facione 1992) has internal consistency coefficients ranging from .75 to .96, indicating that 
the items are highly related. The KR-20, another reliability estimate reported in the templates, can be 
interpreted in the same manner as Alpha. The Critical Thinking Test of the CAAP (American College 
Testing Program 1989) has a KR-20 value of .81-82, indicating that it is a reliable measure with 
homogeneous items. 

Split-half reliability estimates represent another internal consistency method. The most 
often used method of split-half reliability involves using the even numbers to create one half-test and the 
odd numbers to compose the second half-test. In addition, test content can determine the division of items 
on a test. The same students are given each half-test and the scores are correlated, giving a coefficient of 
equivalence. As an overall reliability measure, the split-half reliability will give an underestimate of total 
test reliability, due to fewer items. The utility of the estimate is that item homogeneity is tested. In the 
case of the Watson-Glaser Critical Thinking Appraisal (Watson and Glaser 1980), the split-half reliability 
estimates ranged from .69 to .85, indicating item homogeneity and a reliable measure. 

Inter-rater reliabilities are estimated to find the consistency of scores across raters. The 
Reflective Judgement Interview (King and Kitchener 1994) was found to have an inter-rater reliability of 
.97 (Mines et al. 1990), indicating that across raters there was high consistency in scores. Although this 
measure gives some indication of consistency, it only considers consistency across raters. What if items 
affect the performance of individuals? Some items may be harder or easier for students and raters; 
therefore, inter-rater reliability is a limited reliability estimate for performance assessment. The 
Generalizability coefficient discussed later is a more extensive estimate of reliability. Related to inter- 
rater reliability is inter-rater agreement. Inter-rater agreement is not a reliability estimate, but rather an 
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item-by-item percentage of agreement across raters. The inter-rater agreement percentages reflect the 
degree of similarity in ratings for each item. 

Another estimate of reliability is test-retest reliability, which assesses test consistency over 
time. The same form of a test is given at different occasions that can vary from hours to days to weeks, or 
even years. The time interval may depend on factors such as content of the test or developmental and 
maturational considerations. The test-retest reliability estimate is often called the coefficient of stability, 
since it addresses test score stability over time. The Problem Solving Inventory (Heppner 1982) has been 
tested across various time intervals, with more reliable estimates found for shorter time intervals: .83-89 
across 2 weeks, .77-81 across 3 weeks and .44-65 across 2 years (Heppner and Peterson 1982a; Ritchey, 
Carscaddon, and Morgan 1984). 

To test the consistency of two forms purported to be identical, alternate forms reliability is 
calculated. This method involves two versions of a test given to the same subjects on the same testing 
occasion. A correlation between the scores on each form indicates the alternate forms reliability, also 
called the coefficient of equivalence. The higher the correlation between the two sets of scores, the more 
equivalent the forms are considered. If two forms exist, alternate forms reliability is recommended. The 
Tasks in Critical Thinking tests have alternate forms with reliability across the varying skills (not the 
tasks) ranging from .17 to .90 (Educational Testing Service and the College Board 1989). These values 
indicate that some of the skills assessed by the tasks are reliable, while others fall in an unacceptable 
range. The Watson-Glaser Critical Thinking Appraisal reports an alternate forms reliability of .75, 
moderately supporting the use of the separate forms as identical. Subscales that are internally correlated 
with one another is another form of alternative reliability, which is reported under the Method^ Design 
section. 



The Generalizability coefficient estimates the consistency of scores while accounting for 
more than one variable at a time (error). Instead of conducting a separate internal consistency study and 
an inter-rater reliability study, the two studies can be done at one time using a Generalizability study. A 
Generalizability study creates a G coefficient that can be interpreted as a reliability estimate. The Tasks in 
Critical Thinking (Educational Testing Service and the College Board 1989) have G coefficients ranging 
from .57 to .65, indicating that across raters and items, students’ scores are only moderately reliable. 

Method Design 

There are several methods used to support the structure of a test. The structure of a test 
includes the item representations on subtests and the test, along with the relationship of the subtests to one 
another. More developed tests will use procedures such as factor analysis and differential item analysis. 
Most tests will report item-total correlations or discrimination indices as support for the structure of the 
test. 



Factor analysis is a method that identifies the underlying constructs or factors among items. 
Each subtest is created from a set of items, which theoretically should correlate with one another, since 
they are purported to measure the same concept. By applying factor analysis, the relationships among the 
items can be understood. Factor loadings indicate the amount of relationship or contributing power an 
item has within a subtest or test. Therefore, higher factor loadings indicate items that are more strongly 
related. Optimally, factor analysis results should parallel the hypothesized structure of the test. For 
instance, support for the three subtest structure of the Problem Solving Inventory (Heppner 1982) was 
found using factor analysis (Heppner 1988; Chynoweth 1987; Heppner and Peterson 1982a). 

Another method used to validate test design is item total correlations. These correlations 
reveal how well each item correlates with the total score. The larger the item total correlation, the more 
the item contributes to the subscale or test. Values below .10 indicate an item does not measure the same 
construct as other items on the test, while negative items indicate an inverse relationship among items and 
the total. An analysis of the item total correlations for the California Critical Thinking Skills Test 
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(CCTST) (Facione 1990a) revealed that many of the items did not correlate well with the total test or 
respective subtests. For instance, 10 out of the 34 items on the total test had values below .10 (Jacobs 
1995), indicating little relationship between these items and the total test. Erwin (1997) further supported 
Jacobs’ results, finding that 7 out of 34 of the items on the CCTST had item total correlations below .10. 

Validation of test design can also be supported with item discrimination indexes. Item 
discrimination indexes are a measure of the difference in item responses between high and low scorers. 
They range from 0 to 1.00, with values closer to 1.00 indicating higher discrimination. Greater item 
discrimination indexes suggest a test that is sensitive to differences in ability. The Cornell Critical 
Thinking Test (Ennis, Millman, and Tomko 1985) had indexes ranging from .20 to .24, suggesting 
moderate discrimination among high and low scorers. 

Fairness, related to bias in testing, is usually focused on differences among test takers based 
on variables such as inclusion in a group. For instance, are there unintended differences between males 
and females on critical thinking tests? This is the typical argument in defining whether a test is “fair.” 
What is not considered in this argument is whether a difference in ability level actually exists across 
gender. Males or females may have a naturally higher competency level in critical thinking. In this case, it 
is important to know if items are fair indicators of ability across groups (gender, ethnicity, etc), not just 
whether groups score differently on items. 

Differential item analysis (DIF) allows for the control of ability level, so that differences 
found in scores are attributed to a variable other than ability. When items exhibit DIF they are considered 
unfair,” meaning that individuals from one group are more likely to answer the item correctly than 
individuals from another group, even when ability levels are the same. Traditionally DIF is performed 
across groups such as gender and ethnicity. For instance, the Cornell Critical Thinking Test has four items 
that exhibit gender DIF. Three of the items were more likely to be answered correctly by males compared 
to females with similar critical thinking ability levels. Content analysis of the items revealed some 
hypotheses for the differing scores. Two of the items that males had a better chance of answering 
correctly pertained to stockcars, a subject perhaps more interesting to males than females. Whether the 
content contributed to the differences found, it is clear that males and females of similar ability levels do 
not have a fair chance at getting these items correct. By applying gender DIF analysis, ability levels were 
controlled and a true bias in the test could be found. 



Validity 



Validity involves “building a case” that a test is related to the construct it is intended to 
measure. There are three types of validity: content, criterion, and construct. The most important type of 
validation is construct validity, because it encompasses both content and criterion validity. Therefore, 
inferences made from test scores that have only content or criterion validation are not considered valid 
until construct validity is addressed. When reviewing validity studies in the templates, the external 
validation studies or studies conducted by those other than the test author should be given more 
consideration. External validation studies reveal the amount of use and exposure of the test and can be 
considered unbiased toward the outcomes of the study. 

Content validity deals with the conceptualization of the constructs. Is the content of the test 
representative of the construct (critical thinking or writing) it purports to measure? Does the test represent 
the test developer’s definition? Is there a discrepancy between the test developer’s definition and the test 
user s definition? Do experts judge the test to measure the constructs adequately and appropriately? Tests 
that are conceptualized from theory have stronger content validity over tests that have no theoretical 
backing. The CCTST (Facione 1990a) is a good example of a test with strong content validation. The test 
was conceptualized from a definition of critical thinking developed by the American Philosophical 
Association and the California State University system. 



A second type of validation involves whether a test can be used to infer standing on another 
test or variable. This is called criterion validity. Criterion validity can be measured as predictive (i.e., 
how well one score predicts scores on another test), or as concurrent (i.e., how well one’s current standing 
on a given measure can be predicted from another measure). Typically variables such as class standing, 
GPA, grades, SAT scores, and other relevant tests are used in criterion validation studies. If, for instance, 
SAT scores did accurately predict critical thinking test scores, then it could be inferred that the critical 
thinking test and the SAT test are measuring similar abilities. A study by Mines et al. (1990) revealed that 
one subscale of the Cornell Critical Thinking Test (CCTT) (Ennis, Millman, and Tomko 1985) and three 
subscales of the Watson Glaser Critical Thinking Appraisal (WGCTA) (Watson and Glaser 1980) could 
accurately predict 50 percent of students’ Reflective Judgement Interview scores (King and Kitchener 
1994). The high level of prediction highlights that tests often measure the same construct, even if authors 
profess their tests to be based on different constructs. In general, more studies are needed relating critical 
thinking, problem solving, and writing to other criteria such as job performance or citizenship. 

Construct validity involves content and criterion validity. Construct validity specifically 
addresses the questions of whether the test measures the trait, attribute, or mental process it is purported 
to measure, and whether the scores should be used to describe test takers. Two methods of construct 
validation are correlation studies (convergent and divergent validity) and outcome analysis. To 
understand correlation studies, a brief review of correlations will be given. The correlation coefficient 
represents the amount of relationship between two variables and ranges from -1.00 to 0 to 1.00, with 
values closest to 1.00 and -1.00 indicating a strong relationship. A correlation coefficient from .10 to .20 
represents a small relationship, and values from .30 to .50 indicate moderate relationships between tests. 
A negative correlation, or inverse relationship, indicates that as one variable increases the other decreases. 
Some correlations are corrected for attenuation, which means corrected for unreliability. Measurement of 
variables always involves “error.” By removing the error, a perfect correlation between two variables can 
be calculated. For instance, the correlation between the WGCTA and CCTT is .71, and when corrected 
for attenuation the correlation is .94, indicating that the lack of reliability in the two tests is accounting for 
the lower correlation. 

Convergent and divergent validity involves finding the relationship of the critical thinking, 
problem solving, or writing test to other tests that measure similar and opposite constructs. The column 
Correlation with Other Measures on the templates represents convergent and divergent validity. To 
interpret correlations with other measures, one needs to understand the content behind the measures, and 
how they should logically be related. Two similarly conceptualized writing tests correlated with one 
another should produce moderate correlations around .40 to .60, since some overlap of content is 
expected. High correlation values could be considered indicators of a strong relationship, suggesting that 
individual tests may be measuring the same construct. Many critical thinking tests come under scrutiny as 
being measures of verbal ability. This criticism can be tested using correlation studies comparing critical 
thinking scores with SAT verbal scores or other verbal tests. The CCTT (Ennis, Millman, and Tomko 
1985) scores were correlated with SAT verbal scores (r = .36, .44), revealing that test scores were related 
to a moderate degree (Ennis, Millman, and Tomko 1985; Frisby 1992). Higher correlation values between 
critical thinking tests and verbal ability measures suggest that critical thinking test scores might actually 
be tapping into verbal ability. 

The last method of construct validity is to conduct experimental studies analyzing outcomes. 
If students take a critical thinking, problem solving, or writing course, the hypothesized outcome is that 
students would exhibit a gain in the appropriate skill from pre- to post-testing and would score higher 
compared to students who did not take the proposed course. These studies add substantial support to tests 
as measures of critical thinking, problem solving, and writing. Although significant differences across 
pre- and post-testing give an indication of change, the degree of change is not known. To calculate the 
degree of change, an effect size is used. Effect sizes are the standardized difference between the treatment 
groups (those who received skill training) and the control groups (those who did not receive skill 
training). By standardizing the group differences, comparisons can be made from one study to the next. 
An effect size of .50 indicates half a standard deviation difference between groups. For instance, the 
CAAP was reported to have an effect size of .41 for full-time students versus part-time students, 



indicating a .41 standard deviation increase for students enrolled full-time. Effect sizes should be 
interpreted in light of the degree of change that is expected or desired. 

The reliability and validity of a test cover an immense amount of information regarding the 
consistency and usefulness of scores. As a first step in the review process, it should be noted that 
reliability must be established before validity issues are addressed. If scores are not consistent, then the 
inferences made will also be inconsistent. Once reliability is determined, the content of a test, most 
specifically the definition and domains covered by the test, should be examined for fit with the purpose of 
testing. Any outcome information regarding the content and inferences made from the test should help to 
guide the content review. Correlations with other measures can also help to clarify the tests’ relationships 
with other well-known variables. Perhaps the most important information comes from studies that 
investigate gains in ability not only across time, but across treatment. For instance, individuals receiving 
intense instruction in writing should out-perform those who do not receive training. If a test detects the 
differences in writing ability between these two groups, then the test is supported as a measure of writing. 
Overall, the review process is tedious and involved. Each test must be considered based on the merits of 
its structure, content, score consistency, and inferential potential, in addition to how these elements fit 
with the purpose of testing and the outcomes desired. 
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2. CRITICAL THINKING AND PROBLEM SOLVING 



2.1 Introduction 

Critical thinking and problem solving have been identified as essential skills for college 
students. Many colleges across the nation have begun to teach courses based on these pertinent skills. For 
instance, Chaffee (1991) authored a book Thinking Critically , which can be used as a curriculum guide. 
Although the importance of students demonstrating these skills has been determined, defining these terms 
and finding appropriate assessment methods are complex and involved tasks. In a national report on 
higher education, Jones et al. (1997, pp. 20-21) and Jones et al. (1995, p. 15) give comprehensive 
definitions of problem solving and critical thinking, making distinctions between the two terms. With a 
consensus among 500 policymakers, employers, and educators, the following definitions were created. 
Problem solving is defined as a step-by-step process of defining the problem, searching for information, 
and testing hypotheses with the understanding that there are a limited number of solutions. The goal of 
problem solving is to find and implement a solution, usually to a well-defined and well-structured 
problem. Critical thinking is a broader term describing reasoning in an open-ended manner, with an 
unlimited number of solutions. The critical thinking process involves constructing the situation and 
supporting the reasoning behind a solution. Traditionally, critical thinking and problem solving have been 
associated with different fields: critical thinking is rooted in the behavioral sciences, whereas problem 
solving is associated with the math and science disciplines. Although a distinction is made between the 
two concepts, in real life situations the terms critical thinking and problem solving are often used 
interchangeably. In addition, assessment tests frequently overlap or measure both skills. In keeping with 
the Jones et al. (1995, 1997) definitions, this report will analyze critical thinking and problem solving 
separately, yet attempt to integrate the two skills when appropriate. 



2.2 Definition of Critical Thinking 



A comprehensive definition of critical thinking, the product of studies by Jones et al. (1995, 
1997) can be found in tables 2-8. Critical thinking is defined in seven major categories: Interpretation, 
Analysis, Evaluation, Inference, Presenting Arguments, Reflection, and Dispositions. Within each of 
these categories are skills and subskills that concretely define critical thinking. As a content review of 
critical thinking assessment methods, comparisons were made for each test across the comprehensive 
definition of critical thinking. If test content addresses a skill, then the test acronym appears next to that 
skill. The following table indicates the tests and acronyms used. Tests were chosen for review based on 
several factors: (1) the ability to measure college students’ critical thinking skills and/or critical thinking 
dispositions, and (2) broad scale availability to colleges and universities. 

Table 1 — Test acronyms 



Acronym 


Test Name 


A. PROFILE 


Academic Profile 


CAAP 


Collegiate Assessment of Academic Proficiency 


CCTDI 


California Critical Thinking Dispositions Inventory 


CTAB 


CAAP Critical Thinking Assessment Battery 


CCTST 


California Critical Thinking Skills Test 


CCTT 


Cornell Critical Thinking Test 
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Acronym 


Test Name 


COMP 


College Outcomes Measures Program - Objective Test 


ETS TASKS 


ETS Tasks in Critical Thinking 


MID 


Measure of Intellectual Development 


PSI 


Problem Solving Inventory 


RJI 


Reflective Judgement Inventory 


WGCTA 


Watson Glaser Critical Thinking Appraisal 



Several methods were used to match the test content with the definition of critical thinking. 
For the Academic Profile, CAAP, CCTDI, CTAB, CCTST, COMP, and ETS Tasks, the definitions 
created by the author(s) were used as a guide in determining content on the test. For the CCTT, PSI, and 
WGCTA, the tests were reviewed to determine the content, due to the lack of specific skills or definitions 
given by the author(s) in the test manual. The RJI and MID, which are based on stages, were analyzed in 
light of the information that would be needed to separate individuals at different stages. It should also be 
noted that the PSI measures perceptions of critical thinking skills; therefore, if the PSI is indicated to 
measure a skill in the tables, it should be interpreted as measuring perception of that skill. Caution should 
be used in interpreting tables 2-8, due to the subjective process used to compare tests and definitions. 
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Table 2 — Interpretation skills measured by critical thinking tests 



Interpretation 


A. 


CAAP 


CCTDI 


CTAB 


CCTST 


CCTT 


COMP 


ETS 


MID 


PS I 


RJI 


WG 




Profile 














TASKS 








CT 


























A 


Categorization 
1 . Formulate categories, 
distinctions, or 
frameworks to organize 
information in such a 
manner to aid 
comprehension. 










* 




* 


* 










2. Translate information 
from one medium to 
another to aid 
comprehension without 
altering the intended 
meaning. 










* 






* 










3. Make comparisons; 
note similarities and 
differences between or 
among informational 
items. 










* 






* 




i 






4. Classify and group 
data, findings, and 
opinions on the basis of 
attributes or a given 










* 






* 










criterion. 


























Detecting Indirect 
Persuasion 

1. Detect the use of strong 
emotional language or 
imagery that is intended 
to trigger a response in an 
audience. 










* 


* 




* 










2. Detect the use of 
leading questions that are 
biased towards eliciting a 
preferred response. 












* 




* 










3. Detect “if, then” 
statements based on the 
false assumption that if 
the antecedent is true, so 
must be the consequence. 










* 


* 












* 
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Table 2 — Interpretation skills measured by critical thinking tests — Continued 



Interpretation 


A. 

Profile 


CAAP 


CCTDI 


CTAB 


CCTST 


CCTT 


COMP 


ETS 

TASKS 


MID 


PSI 


RJI 


WG 

CT 

A 


4. Recognize the use of 
misleading language. 












* 




* 










5. Detect instances where 
irrelevant topics or 
considerations are 
brought into an argument 
that diverts attention from 
the original issues. 










* 


* 




* 








* 


6. Recognize the use of 
slanted definitions or 
comparisons that express 
a bias for or against a 
position. 










* 


* 


* 


* 




/ 






Clarifying Meaning 
























I . Recognize confusing, 
vague, or ambiguous 
language that requires 
clarification to increase 
comprehension. 




* 




* 




* 




* 








* 


2. Ask relevant and 
penetrating questions to 
clarify facts, concepts, 
and relationships. 


























3. Identify and seek 
additional resources, such 
as resources in print, 
which can help clarify 
communication. 














* 


* 










4. Develop analogies and 
other forms of 
comparisons to clarify 
meaning. 
















* 










5. Recognize 
contradictions and 
inconsistencies in written 
and verbal language, data, 
images, or symbols. 










* 


* 












* 
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Table 2 — Interpretation skills measured by critical thinking tests — Continued 



Interpretation 


A. 

Profile 


CAAP 


CCTDI 


CTAB 


CCTST 


CCTT 


COMP 


ETS 

TASKS 


MID 


PSI 


RJI 


WG 

CT 

A 


6. Provide an example 
that helps to explain 
something or removes a 
troublesome ambiguity. 














* 




* 









Table 3 — Analysis skills measured by critical thinking tests 



Analysis 


A. 

Profile 


CAAP 


CCTDI 


CTAB 


CCTST 


CCTT 


COMP 


ETS 

TASKS 


MID 


PSI 


RJI 


WG 

CT 

A 


Examining Ideas and 
Purpose 

1. Recognize the 
relationship between the 
purpose of a 
communication and the 
problems or issues that 
must be resolved in 
achieving that purpose. 






















/ 




2. Assess the constraints 
of the practical 
applications of an idea. 


























3. Identify the ideas 
presented and assess the 
interests, attitudes, or 
views contained in those 
ideas. 
















* 










4. Identify the stated, 
implied, or undeclared 
purpose(s) of a 
communication. 
















* 












O 
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Table 3 — Analysis skills measured by critical thinking tests — Continued 



Detecting and Analyzing 
Arguments 
1 . Examine a 
communication and 
determine whether or not 
it expresses a reason(s) in 
support or in opposition 
to some conclusion, 
opinion, or point of view. 


* 


* 




* 


* 


* 




* 








* 


2. Identify the main 
conclusions of an 
argument. 


* 


* 




* 


* 


* 




* 








* 


3. Determine if the 
conclusion is supported 
with reasons and identify 
those that are stated or 
implied. 


* 


* 




* 


* 


* 




* 




I 




* 


4. Identify the 
background information 
provided to explain 
reasons that support a 
conclusion. 


* 


* 




* 


* 


* 




* 








* 


5. Identify the unstated 
assumptions of an 
argument. 


* 


* 




* 


* 


* 












* 
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Table 4 — Evaluation skills measured by critical thinking tests 



Evaluation 


A. 

Profile 


CAAP 


CCTDI 


CTAB 


CCTST 


CCTT 


COMP 


ETS 

TASKS 


MID 


PS I 


RJI 


WG 

CT 

A 


1 . Assess the importance 
of an argument and 
determine if it merits 
attention. 










* 






* 








* 


2. Evaluate an argument 
in terms of its 
reasonability and 
practicality. 




* 




* 


* 


* 




* 








* 


3. Evaluate the 
credibility, accuracy, and 
reliability of sources of 
information. 




* 




* 


* 


* 




* 








* 


4. Determine if an 
argument rests on false, 
biased, or doubtful 
assumptions. 




* 




* 


* 


* 


* 


* 








* 


5. Assess statistical 
information used as 
evidence to support an 
argument. 




* 




* 


* 


* 












* 


6. Assess how well an 
argument anticipates 
possible objectives and 
offers, when appropriate, 
alternative positions. 










* 






* 










7. Determine how new 
data might lead to the 
further confirmation or 
questioning of a 
conclusion. 










* 


* 














8. Determine and evaluate 
the strength of an analogy 
used to warrant a claim or 
consolation. 
















* 
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Table 4 — Evaluation skills measured by critical thinking tests — Continued 



9. Determine if 
conclusions based on 
empirical observations 
were derived from a 
sufficiently large and 
representative sample. 












* 














10. Determine if an 
argument makes sense. 










* 


* 


* 


* 








* 


1 1 . Assess bias, 
narrowness, and 
contradictions when they 
occur in the person’ point 
of view. 




* 




* 


* 


* 












* 


12. Assess degree to 
which the language, 
terminology and concepts 
employed in an argument 
are used in a clear and 
consistent manner. 




* 




* 


* 


* 








l 




* 


13. Determine what stated 
or unstated values or 
standards of conduct are 
upheld by an argument 
and assess their 
appropriateness to the 
given context. 










* 


* 












* 


14. Judge the consistency 
of supporting reasons, 
including their relevancy 
to a conclusion and their 
adequacy to support a 
conclusion. 


* 


* 






* 


* 


* 


* 








* 


15. Determine and judge 
the strength of an 
argument in which an 
event(s) is claimed to be 
the results of another 
event(s) (causal 
reasoning). 


* 


* 






* 


* 












* 



V 
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Table 5 — Inference skills measured by critical thinking tests 



Inference Skills 


A. 

Profile 


CAAP 


CCTDI 


CTAB 


CCTST 


CCTT 


COMP 


ETS 

TASKS 


MID 


PSI 


RJI 


WG 

CT 

A 


Collecting and 
Questioning Evidence 
1 . Determine what is the 
most significant aspect of 
a problem or issue that 
needs to be addressed, 
prior to collecting 
evidence. 










* 




* 


* 








* 


2. Formulate a plan for 
locating information to 
aid in determining if a 
given opinion is more or 
less reasonable than a 
competing opinion. 














* 


* 




i 






3. Combine disparate 
pieces of information 
whose connection is not 
obvious, but when 
combined offer insight 
into a problem or issues. 
























4. Judge what background 
information would be 
useful to have when 
attempting to develop a 
persuasive argument in 
support of one’s opinion. 
















* 










5. Determine if one has 
sufficient evidence to 
form a conclusion. 










* 


* 












* 


Developing Alternative 
Hypotheses 
1. Seek the opinion of 
others in identifying and 
considering alternatives. 
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Table 5 — Inference skills measured by critical thinking tests — Continued 



Inference Skills 


A. 

Profile 


CAAP 


CCTDI 


CTAB 


CCTST 


CCTT 


COMP 


ETS 

TASKS 


MID 


PSI 


RJI 


WG 

CT 

A 


2. List alternatives and 
consider their pros and 
cons, including their 
plausibility and 
practicality, when making 
decisions or solving 
problems. 
















* 




* 


* 




3. Project alternative 
hypotheses regarding an 
event, and develop a 
variety of different plans 
to achieve some goal. 














* 


* 




* 






4. Recognize the need to 
isolate and control 
variables in order to make 
strong causal claims when 
testing hypotheses. 












* 








1 






5. Seek evidence to 
confirm or disconfirm 
alternatives. 










* 


* 


* 






* 






6. Assess the risks and 
benefits of each 
alternative in deciding 
between them. 
















* 




* 






7. After evaluating the 
alternatives generated, 
develop, when 
appropriate, a new 
alternative that combines 
the best qualities and 
avoids the disadvantages 
of previous alternatives. 



























i 

I BEST copy available 

■ ' 3 4 

O 

ERIC 



20 



Table 5 — Inference skills measured by critical thinking tests — Continued 



Inference Skills 


A. 

Profile 


CAAP 


CCTDI 


CTAB 


CCTST 


CCTT 


COMP 


ETS 

TASKS 


MID 


PSI 


RJI 


WG 

CT 

A 


Drawing Conclusions 
1 . Assess how the 
tendency to act in ways to 
generate results that are 
consistent with one’s 
expectations could be 
responsible for 
experimental results and 
everyday observations. 










* 




* 










* 


2. Reason well with 
divergent points of view, 
especially with those with 
which one disagrees, in 
formulating an opinion on 
an issue or problem. 




















j 


* 




3. Develop and use 
criteria for making 
judgments that are 
reliable, intellectually 
strong, and relevant to the 
situation at hand. 










* 


* 


* 


* 






* 


* 


4. Apply appropriate 
statistical inference 
techniques to confirm or 
disconfirm a hypothesis 
in experiments. 










* 


* 




* 








* 


5. Use multiple strategies 
in solving problems 
including means-ends 
analysis, working 
backward, analogies, 
brainstorming, and trial 
and error. 










* 
















6. Seek various 
independent sources of 
evidence, rather than a 
single source of evidence, 
to provide support for a 
conclusion. 














* 




* 
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Table 5 — Inference skills measured by critical thinking tests — Continued 



Inference Skills 


A. 

Profile 


CAAP 


CCTDI 


CTAB 


CCTST 


CCTT 


COMP 


ETS 

TASKS 


MID 


PS I 


RJI 


WG 

CT 

A 


7. Note uniformities or 
regularities in a given set 
of facts, and construct a 
generalization that would 
apply to all these and 
similar instances. 












* 














8. Employ graphs, 
diagrams, hierarchical 
trees, matrices, and 
models as solution aids. 










* 


* 


* 


* 











Table 6 — Presenting arguments skills measured by critical thinking tests 



Presenting 
Arguments Skills 


A. 

Profile 


CAAP 


CCTDI 


CTAB 


CCTST 


CCTT 


COMP 


ETS 

TASKS 


MID 


— t- 
PSI 


RJI 


WG 

CT 

A 


1 . Present supporting 
reasons and evidence for 
their conclusion(s) which 
address the concerns of 
the audience. 








* 






* 












2. Negotiate fairly and 
persuasively. 








* 






* 




* 








3. Present an argument 
succinctly in such a way 
as to convey the crucial 
point of issue. 








* 






* 


* 


* 








4. Cite relevant evidence 
and experiences to 
support their position. 








* 






* 


* 


* 








5. Formulate accurately 
and consider alternative 
positions and opposing 
points of view, noting and 
evaluating evidence and 
key assumptions on both 
sides. 








* 








* 




* 
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Table 6 — Presenting arguments skills measured by critical thinking tests — Continued 



Presenting 
Arguments Skills 


A. 

Profile 


CAAP 


CCTDI 


CTAB 


CCTST 


CCTT 


COMP 


ETS 

TASKS 


MID 


PS I 


RJI 


WG 

CT 

A 


6. Illustrate their central 
concepts with significant 
examples and show how 
these concepts and 
examples apply in real 
situations. 








* 






* 




* 









Table 7 — Reflection skills measured by critical thinking tests 



Reflection Skills 


A. 


CAAP 


CCTDI 


CTAB 


CCTST 


CCTT 


COMP 


ETS 


MID 


PSI 


RJI 


WG 




Profile 














TASKS 








CT 


























A 


1. Apply the skills of their 
own analysis and 
evaluation to their 
arguments to confirm 
and/or correct their 
reasoning and results. 






* 








* 






i 






2. Critically examine and 
evaluate their vested 
interests, beliefs, and 
assumptions in supporting 
an argument or judgment. 














* 












3. Make revisions in 
arguments and findings 
when self-examination 
reveals inadequacies. 






* 








* 


* 






* 





Table 8 — Dispositions measured by critical thinking tests 



Dispositions 


A. 

Profile 


CAAP 


CCTDI 


CTAB 


CCTST 


CCTT 


COMP 


ETS 

TASKS 


MID 


PSI 


RJI 


WG 

CT 

A 


1. Be curious and inquire 
about how and why things 
work. 






* 














* 






2. Be organized, orderly, 
and focused in inquiry or 
in thinking. 






* 










* 




* 









Table 8 — Dispositions measured by critical thinking tests — Continued 



Dispositions 

3. Willingly persevere 
and persist at a complex 
task. 


A. 

Profile 


CAAP 


CCTDI 

* 


CTAB 


CCTST 


CCTT 


COMP 


ETS 

TASKS 


MID 


PSI 

* 


RJI 


WG 

CT 

A 


4. Be flexible and creative 
in seeking solutions. 
















* 




* 






5. Be inclined to arrive at 
a reasonable decision in 
situations where there is 
more than one plausible 
solution. 






* 










* 




* 


* 




6. Apply insights from 
cultures other than their 
own. 
















* 




/ 






7. Exhibit honesty in 
facing up to their 
prejudices, biases, or 
tendency to consider a 
problem solely from their 
viewpoint. 






* 




















8. Monitor their 
understanding of a 
situation and progress 
toward goals. 














* 






* 






9. Find ways to 
collaborate with others to 
reach consensus on a 
problem or issues. 


























10. Be intellectually 
careful and precise. 






* 










* 




* 






1 1. Value the application 
of reason and the use of 
evidence. 






* 














* 






1 2. Be open-minded; 
strive to understand and 
consider divergent points 
of view. 






* 




. 










* 


* 





tRJC BEST COPY AVAILABLE 



24 38 



Table 8 — Dispositions measured by critical thinking tests — Continued 



Dispositions 

13. Be fair-minded; seek 
truth and be impartial, 
even if the findings of an 
inquiry may not support 
one’s preconceived 
opinions. 


A. 

Profile 


CAAP 


CCTDI 

* 


CTAB 


CCTST 


CCTT 


COMP 


ETS 

TASKS 


MID 


PSI 


RJI 

* 


WG 

CT 

A 


14. Willingly self-correct 
and learn from errors 
made no matter who calls 
them to our attention. 



























In reviewing tables 2-8, it should be noted that no single test measures every aspect of 
critical thinking. In fact, even with all of the tests combined, all critical thinking skills are not assessed. 
Although in comparison to the Jones et al. definition, a comprehensive test is not available, many tests are 
still adequate measures of some critical thinking skills. Analysis of these particular tests can be ftpund in 
the test templates at the end of this section. 



2.3 Definition of Problem Solving 

The ability to solve problems has been defined through a consensus of college and university 
faculty members, employers, and policymakers. The resulting definition produced by Jones et al. (1997) 
will be used as a base for examining the scope of problem-solving assessments reviewed within this 
report. Problem solving is defined as understanding the problem, being able to obtain background 
knowledge, generating possible solutions, identifying and evaluating constraints, choosing a solution, 
functioning within a problem-solving group, evaluating the process, and exhibiting problem-solving 
dispositions. Only three tests were identified as addressing problem-sofving skills: ACT College 
Outcomes Measures Program (COMP) problem-solving subscale, the ETS Tasks in Critical Thinking; 
and the Problem Solving Inventory (PSI). The PSI, when compared to the Jones et al. definition, was not 
found to assess any of the skills; therefore, only the COMP and ETS tests were included in the 
comparison. The full definition follows in table 9. Again, the process used to determine if tests measured 
a skill was subjective and based on the authors’ claims; therefore, the results presented in table 9 should 
be interpreted cautiously. The test templates at the end of this section include in-depth reviews of the 
problem-solving tests. 

From the definition table, it is evident that there is not an adequate measure of problem- 
solving skills and that the most comprehensive measure is the ETS Tasks in Critical Thinking. These 
tasks are purported to measure critical thinking, yet also address many of the skills of problem solving. 
This brings to light the issue that there is considerable overlap in critical thinking and problem solving. 
For instance, the ability to state a problem; evaluate factors surrounding the problem; create, implement, 
and adjust solutions as needed; and analyze the process and fit of a solution — as well as having an active 
inclination towards thinking, solving problems, and being creative — are all skills necessary for both 
problem solving and critical thinking. Therefore, the clear distinctions between problem solving and 
critical thinking exhibited in the definition by Jones et al. may prove difficult to assess and tease apart in 
application. 



Perhaps the most important element in measuring critical thinking or problem solving at the 
college level is the choice of a clear, comprehensive definition to steer the assessment process. If, for 
instance, the purpose of testing is to assess effectiveness in a general education program, then the 
definition should match the curriculum objectives identified and resemble the students’ classroom 
experiences. Once a firm definition is determined and the purpose of testing is known, conceptual and 
methodological considerations can be evaluated. Test users should understand the limitations of particular 
tests to assess a broad range of skills and incorporate these limitations into the assessment plan. The test 
format, multiple-choice or constructed response, is another consideration affecting the types of inferences 
that can be made and the data generated. In essence, there are many complex issues to evaluate; therefore, 
it is recommended that an assessment specialist always be contacted and included in the testing process. 



Table 9 — Problem-solving skills as measured by the COMP and ETS Tasks in Critical Thinking 



Problem-Solving Skills 


COMP 


ETS Tasks 


Understanding the Problem 
Recognize the problem exists. 


* 


* 


Determine which facts are known in a problem situation and which are uncertain. 




* 


Summarize the problem to facilitate comprehension and communication of the 


* 


* 


problem. 




/ 


Identify different points of view inherent in the representation of the problem. 




* 


Identify the physical and organizational environment of the problem. 




* 


Describe the values that have a bearing on the problem. 

Identify time constraints associated with solving the problem. 

Identify personal biases inherent in any representation of the problem. 


* 


* 


Obtaining Background Knowledge 

Determine if they have the background information to solve the problem. 




* 


Apply general principles and strategies that can be used in the solution of other 


* 


* 


problems. 

Use visual imagery to help memorize and recall information. 

Identify what additional information is required and where it can be obtained. 


* 


* 


Develop and organize knowledge around the fundamental principles associated 




* 


with a particular discipline. 

Develop and organize knowledge around the fundamental principles associated 




* 


across functions or disciplines. 

Use systematic logic to accomplish their goals. 


* 


* 


Evaluate arguments and evidence so that competing alternatives can be assessed 




* 


for their relative strengths. 

Organize related information into clusters. 




* 
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Table 9 — Problem-solving skills as measured by the COMP and ETS Tasks in Critical Thinking 
— Continued 



Problem-Solving Skills 


COMP 


ETS Tasks 


Recognize patterns or relationships in large amounts of information. 




* 


Use analogies and metaphors to explain a problem. 






Identify persons or groups who may be solving similar problems. 






Obtaining Background Knowledge — Continued 
Identify time constraints related to problem solving. 






Identify financial constraints related to problem solving. 






Use clear, concise communication to describe a problem. 


* 


* 


Generate Possible Solutions 
Think creative ideas. 




* 


List several methods that might be used to achieve the goal of the problem. 


* 


* 


Be flexible and original when using experiences to generate possible solutions. 






Use brainstorming to help generate solutions. 






Divide problems into manageable components. 




* / 


Isolate one variable at a time to determine if that variable is the cause of the 
problem. 






Develop criteria that will measure success of solutions. 


* 


* 


Determine if cost of considering additional alternatives is greater than the likely 
benefit. 






Measure progress toward a solution. 






Identifying and Evaluating Constraints 

List the factors that might limit problem-solving efforts. 






Question credibility of one’s own assumptions. 




* 


Recognize constraints related to possible solutions. 






Apply consistent evaluative criteria to various solutions. 


* 


* 


Utilize creative and original thinking to evaluate constraints. 






Choosing a Solution 

Reflect upon possible alternatives before choosing a solution. 


* 


* 


Use established criteria to evaluate and prioritize solutions. 


* 


* 


Draw on data from known effective solutions of similar problems. 




* 


Evaluate possible solutions for both positive and negative consequences. 




* 


Choosing a Solution — Continued 
Explore a wide range of alternatives. 


* 


* 




! A 

4 x 



27 



Table 9— Problem-solving skills as measured by the COMP and ETS Tasks in Critical Thinking 
— Continued 



Problem-Solving Skills 


COMP 


ETS Tasks 


Form a reasoned plan for testing alternatives. 

Work to reduce the number of alternatives from which they choose a solution. 
Analyze alternatives to determine if most effective options have been selected. 
Identify deficiencies associated with solutions and how they may be resolved. 
Explain and justify why a particular solution was chosen. 

Prioritize the sequence of steps in a solution. 


* 

* 


* 

* 


Group Problem Solving 

Identify and explain their thought processes to others. 

Be patient and tolerant of differences. 

Understand there may be many possible solutions to a problem. 

Use discussion strategies to examine a problem. 

Channel disagreement toward resolution. 

Fully explore the merits of innovation. 

Pay attention to feelings of all group members. 

Identify and manage conflict. 

Identify individuals who need to be involved in problem solving process. 
Search for aids of methods to reach agreement. 

Integrate diverse viewpoints. 

Stimulate creativity rather than conformity. 

Listen carefully to other’s ideas. 

Understand and communicate risks associated with alternative solutions. 
Work on collaborative projects as a member of a team. 




1 


Evaluation 

Choose solutions that contain provisions for continuous improvement. 
Seek alternative solutions if goals aren’t achieved. 

Determine and review steps in implementation. 

Seek support for solutions. 

Revise and refine solutions during implementation. 

Determine if their solutions integrate well with other solutions. 


* 


* 
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Table 9 — Problem-solving skills as measured by the COMP and ETS Tasks in Critical Thinking 
— Continued 



Problem-Solving Skills 


COMP 


ETS Tasks 


Dispositions 
Learn from errors. 

Work within constraints. 

Actively seek information. 

Take responsible risks. 

Remain adaptable and flexible when implementing solutions. 
Think creatively. 




* 


Search outside their expertise for solutions. 




* 
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TEMPLATES — CRITICAL THINKING AND PROBLEM SOLVING 



I 
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Critical Thinking Methods 
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Tasks in Critical Thinking Scoring Rubrics 

Core scoring method — Analysis and inquiry 
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3. WRITING 



3.1 Introduction 

An effective and meaningful evaluation of postsecondary writing assessments is predicated 
upon a comprehensive understanding of the definition of writing competency. Therefore, the writing part 
of this sourcebook begins with an overview of existing approaches to the definition of writing. This 
preliminary segment also contains a table highlighting the writing skill components measured by several 
existing postsecondary writing tests. In the second section, descriptions of different types of formats used 
to assess writing competency — both directly and indirectly — are provided, with consideration of the 
advantages and disadvantages of each method. This section closes with a discussion of computerized 
writing assessment and an exploration of some global issues relevant to all postsecondary writing 
assessment efforts. Finally, to further aid individuals in the selection of a useful writing assessment, 
details of each existing measure (scoring, author/publisher, testing time, date, cost, specific purposes, 
current users, details related to the utility, and psychometric properties, as well as the scale definition and 
rubrics) are displayed in the context of a comprehensive chart. 



3.2 Definition of Writing ' 

Although writing is clearly a form of communication that connotes activity and change, 
attempts to define writing often focus on the products (essays, formal reports, letters, scripts for speeches, 
step-by-step instructions, etc.) or the content of what has been conveyed to whom. When writing is 
defined only as a product, elaboration of the construct tends to entail specification of whether particular 
elements, such as proper grammar, variety in sentence structure, organization, etc., are present (suggestive 
of higher quality writing) or absent (indicative of lower quality writing). Attention is given to describing 
exactly what is generated and detailing the skill proficiencies needed to produce a given end-product. 
Although educators, researchers, and theorists in the writing field tend to prefer a process-oriented 
conceptualization of writing, research suggests that employers in industry are more interested in defining 
writing competence with reference to products (Jones et al. 1995). Section 3.4 (see below) provides a 
brief summary of the history of process theory in writing assessment. 

In a report on national assessment of college student learning, Jones et al. (1995) provided a 
comprehensive definition of writing, which in addition to including several subcomponents of the 
process, delineates critical aspects of written products. The general categories of key elements composing 
the construct of writing produced by these authors include awareness and knowledge of audience, purpose 
of writing, prewriting activities, organizing, drafting, collaborating, revising, features of written products, 
and types of written products. These researchers developed this definition based on an extensive review of 
relevant literature and feedback from a large sample of college and university faculty members, 
employers, and policymakers representative of all geographic regions in the United States. Stakeholders 
were asked to rate the importance of achieving competency on numerous writing skills upon completion 
of a college education. Jones et al. found that in every area of writing there were certain skills that each 
respondent group believed were essential for college graduates to master in order to facilitate effective 
functioning as employees and citizens. However, there were areas of contention as well. For example, 
employers and policymakers placed less emphasis on the importance of the revision process, tending to 
expect their graduates to be able to produce high-quality documents on the first attempt. In addition, 
employers found the ability to use visual aids, tables, and graphs as more important than did faculty 
members; and faculty members attached more importance to being able to write abstracts and evaluations. 
The resulting definition produced by Jones et al., which only includes skills that were universally 
endorsed by all three groups, is distinct from other definitions in that it is based on a consensus derived 
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empirically from groups that possess very different interests regarding the development of writing skill 
competency through undergraduate training. The Jones et al. definition will, therefore, be used as a base 
for examining the scope of the writing assessments to be surveyed herein. 

Table 10 provides a detailed list of all of the subcomponents addressed in the definition, in 
addition to an indication of which currently available measures assess particular components. Only 
multiple-choice and essay tests are included in the table, because the rubrics used with most portfolio 
measures tend to only address very global dimensions of writing quality. 



Table 10 — Dimensions of writing reflected in assessment methods 

Multiple-Choice Tests 



Components 


CLEP 


SAT-II 


AP-Eng. 

Comp. 


CAAP 


A. 

Profile 


COMPASS 


TASP 


CLAST 


Awareness and Knowledge of Audience 














* 


1 


1 . Consider how an audience will use the 
document. 

2. Choose words that their audience can 
understand. 

3. Understand the relationship between the 
audience and the subject material. 

4. Address audiences whose cultural and 
communication norms may differ from those of 
the writer. 

5. Clearly understand their audiences’ values, 
attitudes, goals, and needs. 

6. Understand the relationship between the 
audience and themselves. 

Other dimensions are covered generally. 


Purpose of Writing 


* 


* 










* 


* 


1 . State their purpose(s) to their audience. 

2. Use vocabulary appropriate to their subject and 
purpose(s). 

3. Arrange words within sentences to fit the 
intended purpose(s) and audiences. 

4. Make appropriate use of creative techniques of 
humor and eloquence when approaching a writing 
task. 

5. Draw on their individual creativity and 
imagination to engage their audience. 

Other dimensions are covered generally. 1 
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Table 10 — Dimensions of writing reflected in assessment methods — Continued 

Multiple-Choice Tests 



Components 


CLEP 


SAT-II 


AP-Eng. 


CAAP 


A. 


COMPASS 


TASP 


CLAST 








Comp. 




Profile 








Prewriting Activities 


















1 . Discuss their piece of writing with someone to 
clarify what they wish to say. 

2. Research their subject. 

3. Identify problems to be solved that their topic 
suggests. 

Other dimensions are covered generally. 


















Organization 


















1. Organize the material for more than one 
audience. 

2. Include clear statements of the main ideas. 


* 
















3. Demonstrate their method of organization to 
their audience(s) by using informative headings. 

4. Write informative headings that match their 
audiences’ questions. 

5. Maintain coherence within sentence. 


* 


* 




* 


* 






I 


6. Maintain coherence among sentences, 


* 






* 


* 








paragraphs, and sections of a piece of writing. 

7. Develop patterns or organization for their ideas. 

8. Use knowledge of potential audience 
expectations and values to shape a test. 

9. Create and use an organizational plan. 

10. Organize their writing in order to emphasize 
the most important ideas and information within 
sentences and larger units such as paragraphs. 


* 
















1 1 . Cluster similar ideas. 

12. Provide a context for the document in the 
introduction. 

1 3. Set up signposts such as table of contents, 
indexes, and side tabs. 

14. Demonstrate patterns of reasoning in their 
writing. 

Other dimensions are covered generally. 


* 






* 


* 


* 


* 
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Table 10 — Dimensions of writing reflected in assessment methods— Continued 

Multiple-Choice Tests 



Components 


CLEP 


SAT-II 


AP-Eng. 


CAAP 


A. 


COMPASS 


TASP 


CLAST 








Comp. 




Profile 








Drafting 


















I Avoid common grammatical errors of standard 
written English. 

2. Quote accurately. 

3. Establish and maintain a focus. 

4. Write effective introductions and conclusions. 

5. Write effectively under pressure and meet 
deadlines. 

6. Make general and specific revisions while they 
write their drafts. 

7. Move between reading and revising of their 
drafts to emphasize key points. 

8. Refine the notion of audience(s) as they write. 
Other dimensions are covered generally. 


















Collaborating 


















1. Collaborate with others during reading and 
writing in a given situation. 

Other dimensions are covered generally. 
















i 


Revising 


















1 . Correct grammar problems. 

2. Revise to improve word choice. 

3. Select, add, substitute, or delete information for 
a specified audience. 










* 








4. Reduce awkward phrasing and vague language. 
Other dimensions are covered generally. 










* 








Features of Written Products 


















1 Use active or passive voice where appropriate. 

2. Use language their audience understands. 

3. Define or explain technical terms. 


* 






* 






* 




4. Use concise language. 

5. Use correct grammar, syntax (word order), 
punctuation, and spelling. 


* 


* 




* 


* 


* 


* 


* 


6. Use correct reference forms. 














* 




7. Use the specific language conventions of their 
academic discipline or professional area. 

Other dimensions are covered generally. 












* 


* 


* 
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Table 10 — Dimensions of writing reflected in assessment methods — Continued 

Multiple-Choice Tests 



Components 


CLEP 


SAT-II 


AP-Eng. 


CAAP 


A. 


COMPASS 


TASP 


CLAST 








ComD. 




Profile 








Written Products 


















1. Write memoranda. 

2. Write letters. 

3. Write formal reports. 

4. Write summaries of meetings. 

5. Write scripts for speeches/presentations. 

6. Complete pre-printed forms that require written 
responses. 

7. Write step-by-step instructions. 

8. Write journal articles. 

9. Write policy statements. 

Other dimensions are covered generally. 


















Other 

1. Style. 

2. Avoidance of figurative language. 

3. Shifts in construction. 

4. Analyzing rhetoric. 

5. Ambiguity/wordiness. 


* 


* 


* 


* 


* 


* 




1 


6. Insightful support for ideas. 


* 
















7. Point of view exemplified. 

8. Maintenance of a consistent tone. 

9. Effective opening and closing. 

10. Avoidance of generalizations, cliches. 

1 1. Awareness, insight into complexities of 
prompt. 

12. Separating relevant from irrelevant 
information. 


* 
















13. Depth, complexity of thought. 

14. Sentence variety. 


* 
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Table 10 — Dimensions of writing reflected in assessment methods — Continued 



Local Essay Commercial 

Tests Essay Tests 



Components 


TASP 


CLAST 


SEEW 


IIEP 


NJCBSPT 


SMSU 


College 

Base 


Praxis I 




















Awareness and Knowledge of Audience 


















1 . Consider how an audience will use the 


















document. 


















2. Choose words that their audience can 


















understand. 


* 
















3. Understand the relationship between 
the audience and the subject material. 

4. Address audiences whose cultural and 


* 
















communication norms may differ form 
those of the writer. 

5. Clearly understand their audiences’ 
values, attitudes, goals, and needs. 

6. Understand the relationship between 
the audience and themselves. 

Other dimensions are covered generally. 


* 














1 


Purpose of Writing 


















1 . State their purpose(s) to their 




* 




* 


* 


* 




* 


audience. 

2. Use vocabulary appropriate to their 
subject and purpose(s). 


* 


* 










* 




3. Arrange words within sentences to fit 
the intended purpose(s) and 
audiences. 


* 


* 














4. Make appropriate use of creative 
techniques of humor and eloquence when 
approaching a writing task. 










* 








5. Draw on their individual creativity and 
imagination to engage their audience. 
Other dimensions are covered generally. 






* 


* 


* 


* 




* 


Prewriting Activities 

1 . Discuss their piece of writing with 
someone to clarify what they 

wish to say. 

2. Research their subject. 

3. Identify problems to be solved that 
their topic suggests. 

Other dimensions are covered generally. 








. 
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Table 10 — Dimensions of writing reflected in assessment methods — Continued 



Local Essay Commercial 

Tests i Essay Tests 



Components 


TASP 


CLAST 


SEEW 


IIEP 


NJCBSPT 


SMSU 


College 

Base 


Praxis I 


Organization 

1 . Organize the material for more than 
one audience. 

2. Include clear statements of the main 
ideas. 

3. Demonstrate their method of 
organization to their audience(s) by using 
informative headings. 

4. Write informative headings that match 
their audiences’ questions. 

5. Maintain coherence within sentence. 






* 




* 


* 






6. Maintain coherence among sentences, 
paragraphs, and sections of a piece of 
writing. 




* 


* 






* 




* 


7. Develop patterns or organization for 


* 


* 


* 


* 




* 




7 


their ideas. 

8. Use knowledge of potential audience 
expectations and values to shape a test. 

9. Create and use an organizational plan. 

1 0. Organize their writing in order to 


* 


* 


* 


* 








* 


emphasize the most important ideas and 
information within sentences and larger 
units such as paragraphs. 

1 1 . Cluster similar ideas. 








* 


* 






* 


1 2. Provide a context for the document in 
the introduction. 

1 3. Set up signposts such as table of 
contents, indexes, and side tabs. 

14. Demonstrate patterns of reasoning in 
their writing. 

Other dimensions are covered generally. 


* 


* 


* 


* 


* 


* 


* 


* 
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Table 10 — Dimensions of writing reflected in assessment methods — Continued 



Local Essay Commercial 

Tests ‘ Essay Tests 



Components 


TASP 


CLAST 


SEEW 


IIEP 


NJCBSPT 


SMSU 


College 

Base 


Praxis I 


Drafting 


















1 . Avoid common grammatical errors of 


* 


* 


* 






* 






standard written English. 

2. Quote accurately. 

3. Establish and maintain a focus. 

4. Write effective introductions and 
conclusions. 

5. Write effectively under pressure and 
meet deadlines. 

6. Make general and specific revisions 
while they write their drafts. 

7. Move between reading and revising of 
their drafts to emphasize key points. 

8. Refine the notion of audience(s) as 
they write. 

Other dimensions are covered generally. 


* 














! 


Collaborating 

1 . Collaborate with others during reading 
and writing in a given situation. 

Other dimensions are covered generally. 


















Revising 

1 . Correct grammar problems. 

2. Revise to improve word choice. 

3. Select, add, substitute, or delete 
information for a specified audience. 

4. Reduce awkward phrasing and vague 
language. 

Other dimensions are covered generally. 










4 








Features of Written Products 

1 . Use active or passive voice where 
appropriate. 

2. Use language their audience 
understands. 

3. Define or explain technical terms. 

4. Use concise language. 




* 




* 










5. Use correct grammar, syntax (word 
order), punctuation, and spelling. 

6. Use correct reference forms. 


* 


* 


* 


* 


* 


* 


* 


* 


7. Use the specific language conventions 
of their academic discipline or 
professional area. 












* 






Other dimensions are covered generally. 




* 


* 


* 


* 


* 




* 
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Table 10 — Dimensions of writing reflected in assessment methods — Continued 



Local Essay Commercial 

Tests Essay Tests 



Components 


TASP 


CLAST 


SEEW 


IIEP 


NJCBSPT 


SMSU 


College 

Base 


Praxis I 


Written Products 

1. Write memoranda. 

2. Write letters. 

3. Write formal reports. 

4. Write summaries of meetings. 

5. Write scripts for speeches or 
presentations. 

6. Complete pre-printed forms that 
require written responses. 

7. Write step-by-step instructions. 

8. Write journal articles. 

9. Write policy statements. 

Other dimensions are covered generally. 


















Other 

1. Style. 

2. Avoidance of figurative language. 

3. Shifts in construction. 

4. Analyzing rhetoric. 

5. Ambiguity/wordiness. 






* 






* 




/ 








6. Insightful support for ideas. 

7. Point of view exemplified. 








* 


* 




* 


* 


8. Maintenance of a consistent tone. 










* 








9. Effective opening and closing. 

10. Avoidance of generalizations, 
cliches. 

1 1. Awareness, insight into complexities 
of prompt. 

12. Separating relevant from irrelevant 
information. 

13. Depth, complexity of thought. 

14. Sentence variety. 


* 


* 






* 






* 
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Table 10 — Dimensions of writing reflected in assessment methods — Continued 









Commercial 


Essav Tests 






Components 


COMP 


A. 

Profile 


CAAP 


MCAT 


TWE 


GMAT 


SAT-II 


CLEP 


Awareness and Knowledge of Audience 
1 Consider how an audience will use the 
document. 

2. Choose words that their audience can 
understand. 

3. Understand the relationship between the 
audience and the subject material. 

4. Address audiences whose cultural and 
communication norms may differ from 
those of the writer. 

5. Clearly understand their audiences’ 
values, attitudes, goals, and needs. 

6. Understand the relationship between the 
audience and themselves. 

Other dimensions are covered generally. 


* 
















Purpose of Writing 

1 State their purpose(s) to their audience. 

2. Use vocabulary appropriate to their 
subject and purpose(s). 

3. Arrange words within sentences to fit 




* 


* 










* 


the intended purpose(s) and audience. 

4. Make appropriate use of creative 
techniques of humor and eloquence when 
approaching a writing task. 

5. Draw on their individual creativity and 
imagination to engage their audience. 
Other dimensions are covered generally. 






* 


* 




* 






Prewriting Activities 

1 . Discuss their piece of writing with 
someone to clarify what they wish to say. 

2. Research their subject. 

3. Identify problems to be solved that their 
topic suggests. 

Other dimensions are covered generally. 
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Table 10 — Dimensions of writing reflected in assessment methods — Continued 









Commercial 


Essav Tests 






Components 


COMP 


A. 

Profile 


CAAP 


MCAT 


TWE 


GMAT 


SAT-II 


CLEP 


Organization 

1 . Organize the material for more than one 
audience. 






* 












2. Include clear statements of the main 
ideas. 

3. Demonstrate their method of 
organization to their audience(s) by using 
informative headings. 

4. Write informative headings that match 
their audiences’ questions. 

5. Maintain coherence within sentence. 

6. Maintain coherence among sentences, 
paragraphs, and sections of a piece of 






* 


* 










writing. 






* 


* 








* 


7. Develop patterns or organization for 
their ideas. 

8. Use knowledge of potential audience 
expectations and values to shape a test. 

9. Create and use an organizational plan. 

10. Organize their writing in order to 
emphasize the most important ideas and 
information within sentences and larger 








* 










units such as paragraphs. 








* 










1 1 . Cluster similar ideas. 

12. Provide a context for the document in 
the introduction. 

1 3. Set up signposts such as table of 
contents, indexes, and side tabs. 




* 




* 










14. Demonstrate patterns of reasoning in 
their writing. 

Other dimensions are covered generally. 


* 


* 


* 


* 


* 


* 


* 


* 



31 

o 

ERIC 



55 



Table 10 — Dimensions of writing reflected in assessment methods — Continued 





Commercial Essav Tests 


Components 


COMP 


A. 

Profile 


CAAP 


MCAT 


TWE 


GMAT 


SAT-II 


CLEP 


Drafting 


















1 . Avoid common grammatical errors of 




* 














standard written English. 

2. Quote accurately. 

3. Establish and maintain a focus. 

4. Write effective introductions and 
conclusions. 

5. Write effectively under pressure and 
meet deadlines. 

6. Make general and specific revisions 
while they write their drafts. 

7. Move between reading and revising of 
their drafts to emphasize key points. 

8. Refine the notion of audience(s) as they 
write. 

Other dimensions are covered generally. 




* 










i 




Collaborating 

1. Collaborate with others during reading 
and writing in a given situation. 

Other dimensions are covered generally 


















Revising 

1 . Correct grammar problems. 

2. Revise to improve word choice. 

3. Select, add, substitute, or delete 
information for a specified audience. 

4. Reduce awkward phrasing and vague 
language. 

Other dimensions are covered eenerallv. 


















Features of Written Products 


















1 . Use active or passive voice where 
appropriate. 

2. Use language their audience 
understands. 


* 










* 






3. Define or explain technical terms. 

4. Use concise language. 

5. Use correct grammar, syntax (word 
order), punctuation, and spelling. 

6. Use correct reference forms. 

7. Use the specific language conventions 
of their academic discipline or professional 


* 


* 


* 


* 


* 


* 


* 


* 


area. 


* 






* 


* 


* 


* 


* 


Other dimensions are covered generally. 










. 
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Table 10 — Dimensions of writing reflected in assessment methods — Continued 









Commercial 


Essay Tests 






Components 


COMP 


A. 

Profile 


CAAP 


MCAT 


TWE 


GMAT 


SAT-II 


CLEP 


Written Products 

1 . Write memoranda. 

2. Write letters. 

3. Write formal reports. 

4. Write summaries of meetings. 

5. Write scripts for speeches/presentations. 

6. Complete pre-printed forms that require 
written responses. 

7. Write step-by-step instructions. 

8. Write journal articles. 

9. Write policy statements. 

Other dimensions are covered generally. 


















Other 

1. Style. 

2. Avoidance of figurative language. 

3. Shifts in construction. 

4. Analyzing rhetoric. 

5. Ambiguity/wordiness. 

6. Insightful support for ideas. 

7. Point of view exemplified. 

8. Maintenance of a consistent tone. 

9. Effective opening and closing. 




* 


* 




* 


* 


* 


* 


10. Avoidance of generalizations, cliches. 




* 














1 1. Awareness, insight into complexities 
of prompt. 

12. Separating relevant from irrelevant 
information. 




* 




* 










13. Depth, complexity of thought. 

14. Sentence variety. 








* 


* 


* ' 


* 





Key to Abbreviations: 

CLEP — College-Level Examination Program 
SAT-II — Scholastic Aptitude Test 
AP — Advanced Placement 
CAAP — Collegiate Assessment of Academic 
Proficiency 

COMPASS — Computerized Adaptive Placement 
Assessment and Support System 
TASP — Texas Academic Skills Program 



CLAST — College-Level Academic Skills Test 
SEEW — Scale for Evaluating Expository Writing 
IIEP — Illinois Inventory of Educational Progress 
NJCBSPT — New Jersey College Basic Skills 
Placement Test 

COMP — College Outcome Measures Program 
MCAT — Medical College Admission Test 
TWE — Test of Written English 
GMAT — Graduate Management Test 
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3.3 



Issues Relevant to Writing Assessment 



The Portfolio Approach 

In response to the many concerns regarding essay tests, several writing professionals have 
advocated portfolio assessment as a viable alternative to the timed essay. In portfolio assessment, already 
constructed documents are used instead of generating new ones. Advocates of the portfolio approach 
emphasize the use of “real writing” not produced under artificial conditions, the ability to track the 
development of student abilities over time, congruence with the process model, and the enhanced 
opportunities to measure writing defined in terms of higher-order thinking. Murphy (1994) notes that 
portfolios represent curricula products and, as such, they provide a wealth of information regarding 
experiences in the classroom (both the course content and the manner in which it is communicated). 
Murphy further points out that because portfolios indirectly reveal a wealth of information pertaining to 
the philosophical assumptions and beliefs about teaching and learning that frame educational experiences, 
reflective analysis of portfolio contents can aid both teachers and policymakers seeking to enhance the 
quality of instruction. 

However, White (1993) noted that portfolio assessment gives rise to a host of several issues 
that were not previously encountered in writing assessment. For instance, decisions must be made 
regarding (1) what is to be included in the portfolio, (2) who is responsible for collection and verification 
of materials, (3) what kind of scoring is practically possible, (4) how upper-level assessment can be made 
fair to students coming from majors requiring varying amounts of writing, (5) whether the original 
instructor’s grades and comments should remain on the submissions, and (6) what the most appropriate 
methods are to employ for demonstrating reliability and validity. 

Shortcomings associated with the portfolio approach as it is commonly implemented are 
beginning to be identified as well. For example, Witte et al. (1995) have voiced concern that portfolio 
assessment is often oriented toward the performance of school tasks that may not correlate with 
workplace and citizenship tasks, rendering portfolio assessments incongruent with the forms of 
assessment advocated by the National Education Goals Panel through America 2000. Reliability has also 
been a particularly problematic issue with portfolio assessment. Although holistic scoring is the most 
frequently applied scoring approach, this method can be potentially problematic in that readers must 
examine several samples, often written within many different genres and intended for a number of 
different audiences and purposes with discrepant levels of success, and then must score the whole set of 
writing samples on a single scale (Callahan 1995). With several different types of writing included in the 
portfolio, the rubrics must be general enough to capture the essence of good writing across multiple 
forms; and with less specificity in the rubric anchor points, interpretation becomes more open to judgment 
and is likely to compromise inter-rater reliability. Callahan (1995) outlined additional problems with the 
portfolio approach, including competency of readers for evaluating a wide variety of writing forms and 
the impact of the order of pieces on the reader. The complexity, expense, and labor-intensive nature of 
portfolios are discussed by Callahan as well. 

Finally, it is vital to remain cognizant of the fact that when direct assessment techniques are 
applied to the measurement of writing skills, they represent true direct measures only to the extent that 
the skills of interest are actually reflected in the written products (Power, Fowles, and Willard 1994). 
Moreover, as pointed out by Messick (1992) (cited in Powers, Fowles, and Willard (1994)), any 
measurement of skills or knowledge cannot in actuality be measured, and there is always an inference 
from performances and products to underlying abilities even when the methods seem to be the most direct 
or authentic. 
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Writing Competency 



Adherents of a single factor model of writing ability would argue that attempts to delineate 
skills characteristic of effective writing result in a limited perspective devoid of an appreciation for the 
synthesis of capacities that emerge during the act of writing. The multifactor approach, on the other hand, 
is derived from the premise that writing ability is based on the learning and development of discrete skills 
that can be identified individually. The manner in which one conceptualizes writing ability has 
implications regarding assessment that will be discussed below. 



Holistic Scoring 



Proponents of a global definition of writing ability are typically strong proponents of holistic 
rating scales that are believed to capture the overall essence or quality of writing products. As noted by 
Breland et al. (1987), the primary assumption underlying holistic scoring is that the whole composition is 
more than the sum of its parts. According to Cooper (1977), holistic scoring involves matching a written 
document with a graded series of writing samples, scoring a document for evidence of features central to 
a particular type of writing, or assigning a letter or number grade. Moreover, according to Cooper, the 
assessment should transpire quickly and “impressionistically” following training. 

/ 

Holistic scoring, which yields one general numerical rating of the overall quality of a writing 

product, possesses the obvious benefit of speed, rendering it more practical than the analytic scoring 
approach, which requires ratings on several different factors. Efficiency in scoring is an important 
consideration when assessments are large; yet a critical limitation of the holistic approach is the lack of 
diagnostic information produced pertaining to individual students’ strengths and weaknesses. 

Carlson and Camp (1985) have pointed out that despite rigorous efforts devoted to training 
scorers, there is always some degree of subjective judgment involved in holistic ratings; and these 
personal judgments may be particularly problematic when the writer and the scorers possess discrepant 
sets of cultural conventions and expectations. Research has also shown that ratings are affected by the 
type of writing scored, by various personality dimensions of the writer, and even by personality attributes 
of the scorer (Carrell 1995). For example, Carrell found that narrative essays tended to be rated more 
highly than argumentative pieces, the essays of introverts were often rated higher than those of extraverts, 
and feeling-oriented raters tended to give higher scores than their “thinking-oriented” counterparts. 
Interestingly, in Carrelfs work, there was a lack of significant differences between the scores of raters 
who were trained versus those who were untrained, raising questions pertaining to the impact and utility 
of training. 



Elbow and Yancey (1994) have suggested that holistic scoring is based on the potentially 
erroneous assumption that a complex, multi-dimension performance can be reduced to a single 
quantitative dimension. Although this scoring methodology was developed to preserve and capture the 
essence of the entire writing sample, it may ironically turn out to be far more reductionistic than the 
analytic approach, which at least captures the quality of writing on separate dimensions. 

When single holistic scores are used, it is critically important for readers to agree on how to 
score essays that present skill discrepancies, as when the mechanics and ideas developed are good, but the 
organization is poor (Carlson and Camp 1985). Carlson and Camp raise another potentially problematic 
situation that can arise in the context of holistic scoring. Specifically, there must be agreement on issues 
such as how to rate attempts to compose complex sentences that contain errors versus refraining from the 
use of complex sentences and presenting correct but simple sentences. Compromised reliability is one of 
the most frequently cited disadvantages of holistic scoring. Unfortunately, the most commonly employed 
estimate of reliability with holistically scored essays is inter-rater reliability, which actually tends to be an 



inflated estimate, suggesting that reliability may be a problem of greater magnitude than it seems at first 
glance. 



The reliability of holistic scales can be enhanced substantially by designing rubrics with 
scale points that are clearly defined and differentiated with objective criteria, as opposed to using vague 
descriptors that are open to subjective interpretation. The inclusion of more than one essay requirement 
and the use of multiple raters should also increase the reliability of holistically scored tests. 



Analytic Scoring 



Those who view writing as a set of distinct skills rather than as a global generalized ability 
tend to prefer analytic scoring methods, based on the notion that individual writers may have strengths in 
some areas and deficiencies in others. In analytic scoring, the traits of good writing are broken down into 
categories such as organization, development, awareness of the audience, mechanics, and coherence. 
Within each category the rater makes a judgment regarding how the paper fares on each of the particular 
dimensions using a numerical scale typically ranging from a high of “5” or “6” to a low of “1.” Each 
subscale is usually accompanied by a rubric containing detailed descriptors of the characteristics of essays 
meriting a particular score. Scores on the subscales are then typically added to derive a total score. 

Due to the fact that analytic scoring yields more scores than holistic scoring, not only, is this 
methodology more useful for assessing various dimensions of individual students’ abilities, but it is also 
potentially more valuable for prescribing educational interventions for individuals. Further, in cases 
where several students exhibit similar patterns of deficits, assessment can lead to curriculum reform. In a 
review of holistic versus analytic scoring, Huot (1990) reported that analytic scales tend to have higher 
reliability estimates than holistic methods. 

In terms of disadvantages of analytic scoring, one of the most frequently cited disadvantages 
pertains to increased time needed for development of the scales and for the actual scoring of essays. Also, 
opponents of analytic scoring often voice concerns related to missing an assessment of the writing sample 
as a unified whole, when the components of successful writing are broken down into smaller units. On a 
slightly different note, Carlson and Camp (1985) remind us that the reader’s general impression often 
influences ratings on separate dimensions, thereby rendering the advantage of useful separate score 
information potentially less meaningful. 



Computerized Writing Assessment 



Computer-administered writing assessments are not extremely widespread at this point in 
time; however, computer-adapted testing is becoming increasingly prevalent. For example, the COMPAS 
Writing Skills Placement Test developed by ACT is a multiple-choice, objective test of writing skills that 
requires the student to find and correct errors in essays, without any prompting pertaining to the regions 
of the essays containing flawed segments. ACT plans to have an essay segment available in the future. 
Advances are also being made in the development of computerized writing assessment programs that 
allow for computerized scoring through counting and analysis of targeted numeric indicators in text files. 
The Computerized Inventory of Developmental Writing Traits (CIDWT), developed by a research team 
from the Alaska Writing Program headed by McCurry (see McCurry 1992) provides an efficient, 
inexpensive means for scoring large numbers of essays with reference to fluency, sentence development, 
word choice, and paragraph development. Computerized scoring of essays is likely to provide a valid 
addition to the available measures, particularly in view of the fact that scores on the CIDWT have been 
found to correlate highly with teacher ratings. However, it is unlikely that computerized scoring will be 
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able to assess all of the essential components of effective writing. The rating of qualities such as 
organization, tone of voice, originality of ideas, etc. are not readily conducive to computerized scoring. 

Takayosh 1996 pointed out that several scholars have identified changes in the actual 
processes of writing (invention, drafting, and revision) resulting from the extensive use of computers to 
compose text. More specifically, she notes how many contend that the fluid and recursive nature of 
writing is becoming more visible with the generation of electronic text, and the writing process is 
becoming best conceptualized as a “seamless flow.” Moreover, with the stages of the writing process 
becoming less well defined, Takayosh foresees the need for assessment strategies to reflect this 
transformation. 



Overriding General Issues 



Individuals involved in assessment of higher education outcomes, such as writing 
competency, need to begin the process with a well-formulated definition of writing. Such a definition 
should not only be formulated within a process framework, but it should also include sensitivity to both 
the specific skills that are easily defined (e.g., use of appropriate grammar) as well as the more complex 
or higher order skills (e.g., developing an argument) that may require careful thought and research to 
delineate precisely. The definition opted for should likewise be consistent with the skills developed in the 
curriculum to ensure that the selection or design of measures is closely integrated with the objectives and 
standards of the educational experiences that students encounter. Once an operational definition is 
developed, assessment personnel should examine the specific purpose of the assessment (how the 
outcome data will be used, what inferences will be made from the data generated, and what changes are 
likely to result), in addition to considering the conceptual and methodological criteria outlined above, to 
select an appropriate existing measure or to help guide the development of a new assessment strategy. 

When the advantages and disadvantages of direct vs. indirect measures are carefully 
analyzed, most professionals arrive at the conclusion that for a complete description of writing ability, a 
combination of the two forms provides the most thorough, methodologically sound, and reasonable 
solution (Miller and Crocker 1990; Swanson, Norman, and Linn 1995). To entirely replace selected 
response measures with essay-type tests or portfolios could be detrimental to writing assessment. As 
Breland (1996) noted, the decontextualized skills measured with multiple-choice type tests represent 
skills that are perhaps more readily taught than teaching students how to ‘generate high-quality text. 
Moreover, skills such as learning to recognize problematic elements in writing are important to many life- 
and job-related tasks. The combination of selected and constructed response items enables coverage of 
both the drafting and revision stages of the writing process. Breland has further pointed out that as we 
increasingly include free-response writing in our assessment efforts, research should be devoted to 
identifying the effects of assessment changes on the actual development of students’ writing abilities. At 
this point in time data are not available to demonstrate that the new assessment strategies result in the 
improvement of students’ writing abilities. 



3.4 Writing Templates 

Over the last three decades a number of process-oriented theoretical models have been 
generated by various writing experts. In 1964, Rohman and Wlecke proposed a model of writing that 
entailed conceptualization of the writing process as a linear sequence of activities, each of which could be 
analyzed at a given point in time. Rohman and Wlecke further discussed division of the process into a 
prewriting stage, which occurs prior to the actual construction of a document, and a writing phase, which 
also incorporates rewriting activities. Rohman and Wlecke emphasized a distinction between thinking and 
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writing, yet focused on the importance of stimulating, spontaneous, and original thinking as a prerequisite 
to high-quality, expressive writing. 

Several theorists subsequently adopted a slightly different approach, continuing to adhere to 
the idea of writing as a process, but preferring a more dynamic, less sequential conceptualization. 
Research conducted by Emig (1971), Faigley et al. (1985), and Sommers (1980) revealed not only that the 
composing process did not necessarily follow a linear path as previously believed, but also that revision 
strategies employed by experienced writers differed qualitatively from those of college freshmen. 
Zemelman (1977), whose ideas about writing clearly diverge from the earlier, linear approach, defined 
writing as “a complex process combining many mental activities, each depending on and influencing 
others: enumerating, categorizing, developing terms, gaining a sense of active participation in a subject, 
sensing and analyzing one’s reactions to a situation, abstracting, seeing new connections and underlying 
patterns, developing arguments, [and] developing hierarchies of significance” (p. 228). 

One of the most prominent models of the writing process to develop out of this second wave 
of theoretical work was one originally proposed by Flower and Hayes (1981) and updated by Hayes 
(1996). The emphasis in their framework is on the writer’s inner, cognitive processing, with “planning,” 
translating,” and “reviewing” constituting the major classes of mental events that engage the writer. 
Flower and Hayes also delineated several subprocesses corresponding to each major process, and they 
contend that the writer monitors his or her movement through different parts of the process based on 
individualized goals, writing habits, and writing style. By incorporating the work of developmental 
psychologists such as Piaget and Vygotsky, Britton (1975) arrived at the conclusion that language , is not a 
passive means for transcribing knowledge, but is instead inextricably intertwined with thinking and 
learning. 



A third line of theoretical work was initiated by Bizzell (1982), among others, who felt that 
although the model offered by Flower and Hayes provided very useful information pertaining to how 
writers compose, the model neglected the social element of writing. Bizzell described the social context 
of writing as involving more than just a connection to the audience, incorporating the expectations of the 
community with which the writer is affiliated as well. Similarly, Faigley et al. (1985) have suggested that 
an attempt to understand fully the writing process requires that we “look beyond who is writing to whom 
[and look instead] to the texts and social systems that stand in relation to the act of writing” (p. 539). 
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TEMPLATES — WRITING COMMERCIALLY DEVELOPED TESTS 
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■Demonstrates incompetence. Such a paper is seriously flawed by one or more of the following weaknesses: 
• very poor organization; 
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Scale Definition/Rubric/Specificity of Anchor Points 
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category may reveal one or more of the following weaknesses: 
• inadequate organization or development; 
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4 — May display some errors in usage, but no consistent pattern is apparent. 

5 — Have few errors in usage. 
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major topics are supported by concrete, specific detail. 
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PREFACE 



The National Postsecondary Education Cooperative (NPEC) was authorized by Congress in 1994. 
It charged the National Center for Education Statistics to establish a national postsecondary cooperative 
to promote comparable and uniform information and data at the federal, state, and institutional levels. In 
accordance with this charge, the projects supported by the Cooperative do not necessarily represent a 
federal interest, but may represent a state or institutional interest. Such is the case with this Sourcebook. 
While there is no federal mandate to assess the cognitive outcomes of postsecondary education, some 
states and many institutions have identified cognitive assessment as a way of examining the outcomes of 
their educational programs. This project was undertaken to facilitate these efforts. 

The National Postsecondary Education Cooperative (NPEC), in its first council meeting held in the 
fall of 1995, identified student outcomes as a focus area. The NPEC Steering Committee appointed two 
working groups, Student Outcomes from a Policy Perspective and Student Outcomes from a Data 
Perspective, to explore the nature of data on student outcomes and their usefulness in policymaking. The 
exploratory framework developed by the policy working group is presented in the paper Student 
Outcomes Information for Policy-Making (Terenzini 1997) (see http://nces.ed.gov/pubs97/97991.pdf ). 
Recommendations for changes to current data collection, analysis, and reporting on student outcomes are 
included in the paper Enhancing the Quality and Use of Student Outcomes Data (Gray and Grace 1997) 
(see http://nces.ed.gov/pubs97/97992.pdf ). Based on the work undertaken for these reports, both working 
groups endorsed a pilot study of the Terenzini framework and future research on outcomes datA and 
methodological problems. 

In 1997, a new working group was formed to review the framework proposed by Terenzini vis-a- 
vis existing measures for selected student outcomes. The working group divided into two subgroups. One 
group focused on cognitive outcomes, and the other concentrated on preparation for employment 
outcomes. The cognitive outcomes group produced two products authored by T. Dary Erwin, a consultant 
to the working group: The NPEC Sourcebook on Assessment , Volume 1 : Definitions and Assessment 
Methods for Critical Thinking , Problem Solving, and Writing; and The NPEC Sourcebook on Assessment , 
Volume 2: Selected Institutions Utilizing Assessment Results. Both publications can be viewed on the 
NPEC Web site at http://nces.ed.gov/npec/ under “Products.” 

The NPEC Sourcebook on Assessment, Volume 2: Selected Institutions Utilizing Assessment 
Results , provides eight case studies of institutions that have addressed policy-related issues through the 
use of the assessment methods. Administrators, faculty, and others in postsecondary education can use 
Volume 2 as a resource to learn about how these eight institutions are using student outcomes assessment 
methods for both internal and external policy-related purposes. 

Working group members, a consultant to the group, testing companies, test developers, and heads 
of higher education organizations identified the institutions presented as case studies in Volume 2. These 
institutions are illustrative rather than representative of all types of higher education institutions. The 
NPEC Sourcebook on Assessment, Volume 2, is designed to convey the experiences of these eight 
institutions in using higher education assessment data of student competencies in the areas of writing and 
critical thinking. The analyses are not an endorsement or a criticism of any specific assessment method. 

The NPEC Sourcebook on Assessment, Volume 7, a companion to Volume 2, is a compendium of 
information about specific tests used to assess critical thinking, problem solving, and writing cognitive 
skills. The interactive version of Volume 1 (see http://nces.ed.gov/npec/evaltests/ ) allows users to specify 
their area(s) of interest and create a customized search of assessment measures within the three domain 
areas: critical thinking, problem solving, and writing. 





Your comments on the case studies are always welcome. We are particularly interested in your 
suggestions concerning student outcomes variables and measures, potentially useful products, and other 
projects that might be appropriately linked with future NPEC student outcomes efforts. Please e-mail your 
suggestions to Nancy Borkow (Nancy Borkow@ed.gov - ) . the NPEC Project Director at the National 
Center for Education Statistics. 



Toni Larson, Chair 

NPEC Student Outcomes Pilot Working Group: 
Cognitive and Intellectual Development 



EXECUTIVE SUMMARY 



In 1994, the United States Congress authorized the establishment of the National 
Postsecondary Education Cooperative (NPEC) under the auspices of the National Center for Education 
Statistics (NCES). NPEC’s overarching goal is to produce better decisions through better data. This 
Executive Summary describes one project undertaken by NPEC. 

At the first NPEC Council meeting, “student outcomes” was identified as an issue of great 
importance to higher education. Since NPEC’s inception, several working groups have focused on 
selective aspects of this topic. The NPEC Sourcebook on Assessment, Volume 2: Selected Institutions 
Utilizing Assessment Results (Erwin 2000), the main focus of this Executive Summary, is just one of the 
products produced by NPEC’s Student Outcomes Pilot Working Group: Cognitive and Intellectual 
Development. 

The main purpose of the NPEC Student Outcomes Pilot Working Group project is to find a 
better way to link student outcomes information with decisionmaking by external constituents and 
policymakers. In 1996, during the first phase of the Student Outcomes project, an NPEC working group 
developed a framework for linking student outcomes to policy issues. The framework is described in 
Student Outcomes Information for Policy-Making (1997), written by Patrick T. Terenzini, a consultant to 
the project. In 1997, another working group was appointed and given the task of applying the framework 
to outcome variables in the cognitive and intellectual development domain. A pilot test was conducted 
that examined the effectiveness of applying specific criteria described in the framework to cognitive and 
intellectual development in the context of policy issues. ■ 

The framework presented in the Terenzini paper has four parts: (1) a taxonomy of 
postsecondary education policy issues, (2) a taxonomy of student outcomes, (3) a matrix for linking 
student outcomes and policy issues, and (4) a set of criteria divided into three screens (i.e., first screen — 
relevance, utility, applicability; second screen — interpretability, credibility, fairness; third screen — scope, 
availability, measurability, cost) for evaluating whether information about a given student outcome 
variable is valuable for policymaking. 

The Student Outcomes Pilot Working Group selected three outcome variables — problem 
solving, critical thinking, and writing — in the cognitive and intellectual development domain. The NPEC 
Sourcebook on Assessment, Volume I: Definitions and Assessment Methods for Critical Thinking , 
Problem Solving, and Writing (2000), was also developed by T. Dary Erwin. It is a compilation of tests 
that measure these three variables in students. Beyond its usefulness for the student outcomes project, the 
sourcebook is designed to help institutions and states select methods that assess the three cognitive 
outcomes. The sourcebook includes an analysis of scope, availability, measurability, cost, and other 
methodological considerations for the various test instruments included in the book. 

In the next phase of the Student Outcomes Pilot Working Group project, (1) sites were 
identified where several of these assessment methods described in the sourcebook are used, (2) a 
questionnaire was developed for use in the interview process, and (3) telephone interviews were 
conducted with people at eight postsecondary sites. The eight institutions selected for the case studies 
segment of the project were as follows: Eastern New Mexico University (Portales and Roswell), East 
Tennessee State University, Mercer County Community College, Northwest Missouri State University, 
Santa Fe Community College, Southeast Missouri State University, Tennessee State University, and 
Washington State University. The individual interviewed at each site was someone actively involved in 
student assessment. The NPEC Sourcebook on Assessment, Volume 2: Selected Institutions Utilizing 
Assessment Results (Erwin 2000) presents the results of the case studies conducted as part of the Student 
Outcomes Cognitive project. 
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V 



The main purpose of the case study project was to discover when and how student outcomes 
assessments in the three cognitive areas are used. In this instance, the case study approach was not 
intended to provide in-depth insights into the many aspects of student assessments. 

A most important finding of the project is that information on student outcomes is typically 
not used outside the boundaries of the campus. Several other common themes emerged from the case 
studies: 



• The primary goal of student outcomes assessment is to understand student 
competencies in order to facilitate improvements in curricula and teaching methods. 

• Assessment is used most often by and within institutions for institutional 
improvement, by campus boards, and by accreditation agencies. External usage by 
legislative and executive branches and other bodies is limited. 

• The data from the assessment process can be used for funding, accreditation, program 
restructuring, and remediation decisions. 

• For half the institutions where interviews were conducted, assessment is mandated by 
the state. 

• There is general satisfaction with the assessment methods used but also a desire for 
additional methods in other areas of general education. 

• There is a desire for the design of computer-based assessment methods. 

• Faculty members are involved in and supportive of the assessment process. 

• Campuses are encouraging more faculty development through conferences Lid other 
activities. 

• Campuses have considerable interest in developing local assessment methods, 
particularly in the area of writing competencies. 

• Data collection is limited and difficult, and scoring is complex. 

• Institutions see a strong need for flexibility in the use of assessments, and there is a 
movement away from a single exam. 

• Students must be motivated to take assessment seriously. 

• Collaboration with other institutions is a growing trend. 

• The political atmosphere will influence assessment and will probably lead to more 
state mandates in this area. 

Based on information from these institutions, the author identified some issues that were 
considered likely to arise. 

• Expect measures to be mandated in other states that have norm-referenced rankings 
that can be used for comparative purposes or for performance budgeting. External 
constituents still find institutional averages an easy referent to understand. 

• Although some states mandated assessment measures that could be interpreted as 
norm referenced, these measures were later replaced by institutions seeking more up- 
to-date measures more valid for their curricula. There was widespread use but 
movement away from the American College Test — College Outcomes Measures 
Project (ACT — COMP), College Level Academic Skills Test (CLAST), and New 
Jersey College Basic Skills Placement Test (NJCBSPT). 

• There was movement toward seeking more criteria-referenced interpretation in 
outcome measures. For instance, several schools are now using ETS’s Academic 
Profile with its levels of proficiency. For some schools, this action meant more 
locally developed measures, but most institutions lack the expertise and resources to 
design credible measures. Couple this pursuit for measures of diagnostic criteria with 
the desire to improve programs internally, not just to respond to state mandates. 



• Although the schools contacted for this study felt comfortable responding to external 
policy questions about writing and critical thinking, several schools were less 
comfortable responding to questions about other areas in general education. 
Experiments with the Academic Profile and College-BASE tests were mixed. There 
is a need for measures in other areas of learning and development. 

• Several institutions were successful in obtaining state monies for instructional 
improvements. Identifying weaknesses through assessment and trying to correct them 
were generally well received externally. Other schools would be wise to act in similar 
ways rather than sit back and wait for less educationally relevant mandates to come 
down from funding sources. 

• There has been greater use of technology in instructional delivery and testing. Several 
of these colleges, although campus based, are experimenting with Web-based 
courses. Also notable was a trend away from paper and pencil tests to computer- 
based tests such as Accuplacer or Compass. Groups revising existing outcome 
measures or creating new measures should seriously consider computer-based tests 
that can deliver new types of multimedia-based questions or adaptive tests. 
Computer- adaptive tests tailor each test question to the student’s ability as 
determined by performance on prior test questions. 

• All of the colleges contacted for this study expect greater accountability demands 
about higher education in general, not just for their individual institutions. The 
thought of a common set of assessment methods concerns many administrators^ and 
faculty, but the institutions described herein are preparing for that possibility. 

Based on the findings from the two phases of the Student Outcomes Pilot Working Group 
project, the group has recommended that subsequent steps be taken: 

• Expand the sourcebook to include other variables. 

• Expand the sourcebook to include other types of measures (e.g., portfolios, 
competencies). 

• Link with other similar projects to bring the findings together and produce more 
information for practitioners. 

• Identify ways to make the information more accessible and useful for decisionmaking 
(e.g., using the NPEC Web site, sponsoring forums). 

Identifying, measuring, and using student outcomes information is a priority area for NPEC. 
To fulfill the challenge before NPEC — to elicit more readily available, better, and more usable 
information — the task continues. Future projects will need to address how campus-based assessment 
information can be more effectively and completely linked to decisionmaking at all levels — student, 
parent, campus, accreditation, and government. 
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INTRODUCTION 



Higher education assessment data pertaining to student competencies in the areas of writing 
and critical thinking have been used increasingly in recent years to address various policy questions. 
More specifically, colleges and universities are generating student outcomes data for funding purposes, 
accreditation requirements, determination of employer satisfaction with the skills of graduates, and to 
address the needs of diverse student populations that are of concern to external stakeholders. 
Unfortunately, information about the degree to which assessment data are being used for external 
purposes is not widely available. Therefore, the primary objective of this project was to compose a series 
of case studies, based on the experiences of a variety of different types of institutions, to provide highly 
visible examples of the successful use of assessment data for external policy-related decisionmaking 
purposes. Publication over the Internet will enable administrators and faculty affiliated with other 
colleges and universities throughout the country to learn from the experiences of others in order to derive 
effective methods for appropriately addressing pressing policy questions. Participation in this effort was 
limited to a few selected schools; the procedures used to identify appropriate institutions, along with the 
methods used to acquire the information necessary for formulating the case studies, are outlined below. 



METHODOLOGY 

From the outset, the goal was to include institutions that differed in geographic location, 
size, type, and actual assessment methods used. However, this sample of institutions is not to be taken as 
representative of the types of postsecondary education. This report conveys the experiences of eight 
different institutions. Fourteen institutions were originally contacted and invited to participate. A few of 
the individuals who were contacted believed that they could not devote the time required to adequately 
address the project. Other reasons for declining participation were varied. For example, the representative 
of one institution mentioned that the institution was currently restructuring its entire assessment program. 
He felt that what the institution would be doing in the near future had relevance to the project, but that 
previous work in assessment was probably not relevant to this study. 

The process of identifying potential institutions began by contacting members of the Student 
Outcomes Pilot Working Group: Cognitive and Intellectual Development, of the National Postsecondary 
Education Cooperative, testing companies, test developers, and heads of higher education organizations in 
a number of different states throughout the United States. Test developers were obtained from Volume 1 
(see http://nces.ed.gov/npec/evaltests for this sourcebook, which reviews major critical thinking, problem 
solving, and writing collegiate assessment methods), and assessment methods are listed in appendix D, 
Assessment Methods Reviewed for Sourcebook. Each of these information sources was asked to provide 
the names of institutions that have successfully used assessment data to address policy issues. Often, the 
name of a key contact person was provided as well. In cases in which names were not given, academic 
affairs offices were contacted to identify the most appropriate individuals to contact regarding possible 
participation. Once a list of institutions and affiliated personnel was composed, Web sites were visited to 
gather background information pertaining to each of the colleges and universities and to locate any 
information relevant to their assessment work. Telephone calls were then made to explain the study, 
derive more information regarding assessment practices, and ascertain interest in the project. Based on 
this preliminary screening, letters inviting administrators to participate were mailed. A copy of the survey 
to be used as the basis for the 30-minute interview was enclosed to enable potential participants to make 
an informed judgment regarding the appropriateness of including their respective universities in the 
project and to prepare for the interview in the event that they agreed to participate. The survey is provided 
in appendix A, Case Study Questions. Approximately 1 to 2 weeks after the letters were sent, calls were 
made to schedule interviews with those who remained interested. A number of the interviews went 
beyond 30 minutes, yet none of them exceeded 60 minutes. Extensive notes were taken during the 
interview, and the case studies were composed using a general framework (see appendix B, NPEC Case 
Study Categories). 
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ANALYTIC APPROACH 



The institutions included in this report vary in size, geographic location, and mission, and the 
history and scope of assessment efforts were likewise found to differ considerably from one institution to 
the next. Nevertheless, a number of common themes emerged that provide considerable insight into the 
climate of the practice of student assessment for the purpose of addressing policy questions in the United 
States. This final segment of the document attempts to compile the diverse experiences of the institutions 
examined. It is hoped that the reader will be provided with a sense of where higher education stands 
regarding the use of student writing, critical thinking, and problem-solving outcome data for external 
decisionmaking purposes. 



FINDINGS 

The primary stated goal of student outcomes assessment voiced by the administrators polled 
was understanding student competencies to facilitate improvements in curricula and teaching methods. 
Although the administrative representatives interviewed all tended to have more of an internal focus, they 
were all using assessment data for external decisionmaking to some extent (e.g., for accreditation), and 
they all seemed aware that external demands for student outcomes data pertaining to writing and critical 
thinking were likely to increase in the future. Many of those interviewed anticipated statewide 
accountability in the form of performance-based funding and mandated assessment. As a result, a number 
of institutions seemed to be acting in anticipation of mandated assessment. In their attempts to be well 
prepared for what is anticipated, several institutions were engaging in self-study of their courses and 
programs, piloting instruments, and attending professional development workshops. 

Institutional representatives seemed to be motivated not only by expected legislative changes 
but also by an appreciation for the use of assessment to enhance educational quality. A number of 
administrators conveyed success stories in which initial assessment data suggested very low student 
competencies in the areas of writing and critical thinking. These data prompted serious consideration of 
the objectives of particular programs, extensive consultation with professionals beyond the local campus 
setting, collaborative efforts within the institutions, and changes to the content and delivery of courses, 
with the result that student competencies were enhanced. Many of those interviewed mentioned initial 
frustration with low scores and a sense of not knowing where to start with changes. However, once the 
wake-up call was heeded and positive changes were introduced, faculty and administrators tended to gain 
a more comprehensive understanding of the importance of assessment. 

According to the experiences of those interviewed, promotion and tenure decisions for 
individual faculty members are not currently based on assessment data. Nonetheless, substantial changes 
to curricular offerings and program modifications have resulted from the data generated, creating both the 
development of new positions and the elimination of existing positions. 

There is also considerable evidence of institutions collaborating with other colleges and 
universities within their respective states in an effort to conduct meaningful assessments of student 
outcomes. The sharing of experiences and knowledge across institutions seems to be occurring much 
more frequently than in the past, with a great deal of interest expressed about how others are approaching 
various assessment issues. A few schools mentioned that committees were formed with representatives 
from several institutions across the state to locate appropriate assessment measures, coordinate multi- 
institution piloting of commercially available tests, and possibly develop new assessment methods 
specifically designed to address the student population in a particular state. 



A number of administrators mentioned that their institutions were encouraging faculty 
development through funding attendance at national teaching conferences where faculty could learn 
teaching methods for stimulating critical thinking and the development of writing skills. Institutions have 
also often financed speakers and professionals to conduct faculty development seminars. Funds for 
bringing in external review teams have also been more available than in the past. 

Some reluctance for using commercially developed instruments was revealed in the 
interviews, with considerable interest in and plans for developing local assessments, particularly in the 
area of writing competency. The dissatisfaction that was voiced related primarily to perceptions that the 
content of commercial tests inadequately matched the skills believed to be developed in local curricula. A 
number of individuals mentioned course-embedded assessments of writing, using authentic curricular 
products. Concerns about the appropriateness of many commercially available tests for documenting the 
skills and needs of diverse student populations (e.g., first-generation college students, rural residents, 
older students, and economically disadvantaged students) were also mentioned. On the other hand, a 
number of the institutional representatives voiced apprehension about exclusive reliance on locally 
developed tests, stressing the importance of knowing how their students compared to others nationally. 
Many schools seem to be heading toward using a combination of locally developed and nationally 
normed assessment methods. 

A trend away from state-developed placement tests such as New Jersey’s College Basic 
Skills Placement Test (NJCBSPT) and Florida’s CLAST was evidenced in the conversations. This change 
seems to be predicated on the advantages of using one of the commercially available computer adaptive 
tests such as the Accuplacer. / 

Motivating students to take assessments seriously when the results do not preclude further 
study or graduation or have any other direct implications for individual students is an issue encountered 
by most institutions. A variety of approaches have been tried in addressing this issue. Most common 
among these approaches are the use of incentives such as raffles, gifts, and cash for students achieving 
particular scores, along with educational programs designed to help students understand the importance of 
assessment for promoting quality programs and services. Another strategy is to send students’ scores to 
their advisors, who may use the information in composing future student references. 

Few institutions collected data that they were not using, and most of the interviewees 
mentioned the need for data that are not currently available. A couple of administrators indicated the need 
for mid-career and senior assessments for the purpose of conducting pre- and p.ost-longitudinal studies of 
program effectiveness. Others noted the need for assessment methods that are directly linked to the 
missions of their institutions. For example, stimulating interest in life-long learning is an often cited 
objective of undergraduate education, but little is known about how it is achieved or measured. 



CONCLUSION 

Personnel affiliated with each institution highlighted in this project should be commended 
for their success in using student outcome data to effectively improve the quality of the educational 
opportunities provided. Moreover, the institutions included herein were selected based on their efforts to 
address policy-related assessment issues. The innovation and diligence exemplified by their efforts to 
move in this direction can serve as excellent models to inspire others to follow. 

The table presented on the following pages summarizes the institutional responses to the 
questionnaire in appendix A. 
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FUTURE ISSUES 



What role will student outcome assessment have in postsecondary institutions in the future? 
What can be learned from these institutions with active assessment programs? 

At the outset of this project, the NPEC Student Outcomes Pilot Working Group and this 
author expected more widespread use of assessment data for external policy purposes. Certainly, the 
rhetoric associated with accountability data related to student learning is clear. “Institutions of higher 
learning are going to have to do a far better job of explaining what they are asking people to pay for, and 
what the value of it is (Chauncey 1995, 30). The institutions in this review anticipate that performance- 
based funding mandates will increase but are “wary of the prospect.” Based on information from these 
institutions, here are some issues that are likely to arise. 

First, expect measures to be mandated in other states that have norm-referenced rankings 
that can be used for comparative purposes or for performance budgeting. External constituents still find 
institutional averages an easy referent to understand. 

Second, although some states mandated assessment measures that could be interpreted as 
norm referenced, these measures were later replaced by institutions seeking more up-to-date measures 
more valid for their curricula. Note the widespread use, but movement away from, the American College 
Test— College Outcomes Measures Project (ACT— COMP), College Level Academic Skills Test 
(CLAST), and New Jersey College Basic Skills Placement Test (NJCBSPT). / 

Thirdly and similarly, note the movement toward more criteria-referenced interpretation in 
outcome measures. For instance, several schools are now using ETS’s Academic Profile with its levels of 
proficiency. For some schools, this action meant more locally developed measures, but most institutions 
lack the expertise and resources to design credible measures. Couple this pursuit for measures of 
diagnostic criteria with the desire to improve programs internally, not just to respond to state mandates. 

Fourth, although these schools felt comfortable responding to external policy questions 
about writing and critical thinking, several schools were less comfortable responding to questions about 
other areas in general education. Experiments with the Academic Profile and College-BASE tests were 
mixed. Certainly there is a need for measures in other areas of learning and development. 

Fifth, several of these institutions were successful in obtaining ‘state monies for instructional 
improvements, suggesting that a proactive strategy was worth the effort. Identifying weaknesses through 
assessment and trying to correct them were generally well received externally. 

Sixth, note the greater use of technology in instructional delivery and testing. Several of 
these colleges, although campus based, are experimenting with Web-based courses. Also, notice the trend 
away from paper-and-pencil tests to computer-based tests such as Accuplacer or Compass. Groups 
revising existing outcome measures or creating new measures should seriously consider computer-based 
tests that can deliver new types of multimedia-based questions or adaptive tests. Computer-adaptive tests 
tailor each test question to the student’s ability as determined by performance on prior test questions. 

Seventh and last, all of these colleges expect greater accountability needs about higher 
education in general, not just for their individual institutions. It would be desirable for all of higher 
education if collective groups of postsecondary institutions, such as all 4-year colleges within a given 
state, were able to tell an aggregated, single story about the value of higher education. The thought of a 
common set of assessment methods raises concerns for many administrators and faculty, but the 
institutions described herein are acting toward that possibility. Hopefully, educational institutions will 
lead with the selection and design of their own common assessment. 
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Future demands on institutions of higher learning requiring clear specification of curricular 
objectives, precise descriptions of what colleges and universities are purporting to do in the classroom 
context, and provision of convincing evidence that they are achieving their objectives efficiently, can only 
be expected to increase. Further, the demand for increased accountability has naturally led to greater 
government and oversight regulations in higher education. As colleges and universities are increasingly 
being held responsible for the writing and critical thinking competencies of their graduates, it behooves 
institutions to generate credible data needed for external as well as internal audiences. 
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Eastern New Mexico University 

Portales, New Mexico 

Interviewee: Dr. Alec M. Testa, 
Executive Director of Planning and Analysis 



Institutional Background 

Eastern New Mexico University (ENMU), established in 1934, is a regional comprehensive 
university encompassing three separate facilities. The main campus is located in Portales, a city with a 
population of 12,000, near the eastern border of the state. A 2-year branch campus is located in Roswell, 
in the Pecos River valley, and an off-campus instructional center is situated in Ruidoso, in the mountains 
west of Roswell. Enrollment at the Portales campus is approximately 4,000 (57 percent female) with 47 
undergraduate and 15 graduate degree programs offered in liberal arts and sciences, education, business, 
fine arts, and selected vocational/technical areas. 

Eastern New Mexico University is committed to continuous self-examination and has a 
history of innovation directed toward enhancement of the quality of education provided to students. The 
university has invested over 10 years in outcomes assessment, leading the state and much of the 
southwestern United States in higher education assessment. ENMU conducts outcomes assessment with 
the primary goal of enhancing understanding of student learning and growth to facilitate improvement of 
programs and services. The Assessment Resource Office is currently funded at a rate of $150,000 per year 
through a research and public service project assistance program with the New Mexico legislature. The 
Assessment Resource Office’s stated purpose is “to support the University’s ongoing analysis of its 
growing body of assessment data, to broaden the scope of Eastern’s outcomes assessment and 
teaching/leaming efforts, to disseminate these findings within the state, and to enhance student learning.” 



Description and History of the Assessment Method 

In 1986, when ENMU initiated its assessment program, it use.d the ACT — COMP test. 
However, ENMU switched to the Collegiate Assessment of Academic Proficiency (CAAP) in 1993 
because of the closer content match between items on the CAAP and the ACT entrance exam. This match 
facilitated longitudinal studies of student achievement. Dr. Testa further noted that the choice of the 
CAAP was motivated by close observation of the success of other schools, such as Northeast Missouri 
State (now Truman State University). 

Both the CAAP Writing and Critical Thinking tests are administered to ENMU’s rising 
juniors (those having completed 55-65 credit hours). Assessment at ENMU has expanded to include 
measures of academic achievement in the majors, students’ values and attitudes, and students’ reported 
satisfaction with the university as well. CAAP writing scores have been centered around the national 
mean for 4-year public colleges in recent years. Moderate correlations between two introductory English 
courses and CAAP writing scores were recently reported (R’s = .44 and .49). CAAP assessment data are 
not used to determine advancement or graduation for individual students, and Dr. Testa mentioned that 
ENMU is considering establishing a passing criterion score because low student motivation on the 
standardized tests has become a pressing concern in recent years. 





Use of the Data to Address Policy Issues 



Performance-based funding does not currently exist in New Mexico, but Dr. Testa estimated 
the probability of statewide accountability in the future at about 50 percent. Although previous initiatives 
in this direction were blocked in the legislature, support for state-mandated testing is growing. ENMU’s 
early recognition of the need for colleges and universities to monitor and measure their efforts has 
positioned the institution well should the transition to statewide accountability occur. The initial and 
continued use of assessment data is primarily for program enhancement and for accreditation purposes. 
ENMU is accredited by the North Central Association of Colleges and Secondary Schools, and a number 
of the graduate programs are accredited by various agencies (e.g., NCATE). 

Formative personnel decisions (e.g., promotion and tenure) at ENMU are generally not 
based on test data. However, Dr. Testa mentioned that data generated from an ETS major field test were 
used to build a case for a new faculty member with expertise in cellular biology for the Biology 
department. A similar case occurred in the Economics department. 

The Assessment Resources Office at ENMU has conducted extensive employer surveys to 
assess the degree to which employers of Eastern graduates believe ENMU’s former students are well 
prepared for the workforce. Among the specific skill areas addressed in the employer survey are reading, 
writing, decisionmaking, oral expression, math, listening, creative thinking, recognition of problems, 
computer usage, leadership, trainability, responsibility, and accountability. In the area of written 
communication, 74 percent of the employers surveyed indicated that writing skills were either important 
or very important at their particular agencies, and 58 percent indicated that the writing skills of the 
ENMU graduates were above average. In terms of creative thinking skills and the ability to generate new 
ideas, 74 percent of the employers surveyed mentioned that these skills were important or very important 
in their particular employment contexts, while 58 percent indicated that the ENMU graduates that they 
employed were above average in this skill area. 



Future Political Trends Expected to Have an Impact on Assessment 

Assessment data that exist at ENMU but are not currently being used include those 
pertaining to student satisfaction with services such as advising and financial aid. Dr. Testa also noted 
that incoming freshmen complete an intention to transfer survey, which could be examined more closely 
to develop means for enhancing retention rates. Needed data include tests to address variables that are 
related to the mission of the university, such as students’ interest in life-long learning. Finally, when 
questioned about attempts to derive assessment data to answer policy questions by means other than 
traditional forms of assessment, Dr. Testa indicated that ENMU is exploring alternative methods of 
assessing student learning such as portfolio assessment and locally developed tests. 

ENMU’s assessment efforts have been well received both internally and externally. In 
particular, the funding for the Assessment Resource Office provided by the state is very impressive given 
that it is from nonformula funds. Recognition through financial support by the legislature and the 
governor is unparalleled in the other 23 publicly supported higher education institutions across the state. 



Eastern New Mexico University 
Roswell, New Mexico 
Interviewee: Dr. Judy Armstrong, 
Assistant Dean of Instructional Support 



Institutional Background 



Established in 1958, the Roswell Campus of ENMU is governed by the Board of Regents 
and a Community College Advisory Board composed of representatives of the community school district 
boards. Roswell is located in the eastern area of the southern Rocky Mountains region and is a semi-urban 
community with a population of 52,000. Roswell serves as the main financial, business, medical, and 
transportation center for much of southeastern New Mexico. The curriculum consists of both vocational- 
technical and academic programs with specialties in computer information systems, aviation technology, 
and nursing. Enrollment is approximately 2,600, with 1,600 full-time students (32 percent academic 
transfers, 19 percent vocational-technical, 44 percent nondegree seeking, and 5 percent concurrent 
enrollment). The average student age is 32; 60 percent of the students are female. The ethnicity of the 
students represents the surrounding region (57 percent Caucasian, 35 percent Hispanic, 3.5 percent Native 
American, 3 percent Black), and approximately 70 percent receive financial aid. In 1991, Roswell was put 
on a 10-year continuing accreditation cycle by the North Central Association of Colleges and Schools. 

/ 



Description and History of the Assessment Method 



Students in the academic transfer track take the CAAP after completing their studies, and 
those in the vocational track are administered the Student Occupation Competency Aptitude test. 
Assessment was not mandated, but between 1985 and 1986, a task force was developed to examine the 
college’s assessment policies. Roswell decided to adopt a nationally normed assessment measure, based 
on its interest in determining how well its students were achieving compared to others around the country. 
The CAAP was chosen based on the congruence between the test content and Roswell’s curricular goals. 



Use of the Data to Address Policy Issues 



The initial intended use of the CAAP data was to identify curriculum weaknesses so that 
instructional changes designed to build student competencies in needed areas could be introduced. The 
early assessments revealed student deficiencies in critical thinking skills. The college responded by 
providing in-service speakers to teach the faculty about critical thinking and to introduce teaching 
methods designed to develop critical thinking competencies. Five Roswell faculty members attended 
national conferences and shared the information with their colleagues. Extensive changes were made to 
the curriculum, and comparisons between pre- and post-data indicated that students were becoming more 
skilled in this area as a result of their classroom experiences at Roswell. The institution has worked 
diligently to provide critical thinking skills training across the curriculum, and it now offers a course in 
critical thinking. Data generated with the CAAP are also used for accreditation purposes. The data are not 
used for individual summative faculty evaluation purposes, yet program modifications have resulted in 
personnel changes that have been introduced based on assessment data. 

Dr. Armstrong mentioned that some data, such as results from the Pre-Professional Skills 
Test, that are not currently being used to address policy questions could theoretically be used in the 
future. Roswell is developing an assessment of writing competency and is considering the use of a 
portfolio in the future. 
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Implications of the Data Generated 



# ^ r - Armstrong noted that the faculty have witnessed positive changes in the curriculum 
based on information derived from the CAAP, and they are generally very supportive of assessment 
efforts. However, she also added that it has been frustrating at times to identify exactly what changes are 
needed to develop particular skills. Stakeholders have generally been very satisfied with assessment 
efforts. The Board of Regents has also been pleased with assessment efforts undertaken at Roswell. 
Employer survey data indicate that 90 percent of employers are content with the knowledge and skills of 
Roswell graduates. Data from the main campus in Portales further indicate that Roswell transfer students 
achieve comparable or better grades, on average, than students who enroll as freshmen at the main 
campus. Alumni data suggest that students are satisfied with the education that they receive at Roswell as 
well. Freshmen at Roswell complete an essay at the end of a College Success course; Dr. Armstrong 
noted that approximately 10 percent report that attending college has changed their lives entirely. 



. Dr - Armstrong noted that the legislature is trying to pass an accountability report card in the 
state, and, in response to this anticipated change, 17 community college presidents have developed a 
council with the explicit purpose of sharing experiences and coordinating assessment efforts. 



Future Political Trends Expected to Have an Impact on Assessment 



As advice for future policymakers, Dr. Armstrong mentioned greater emphasis on 
performance-based measures, noting that interpretation of figures alone can be frustrating when educators 
are seeking substantive information about how to fortify educational experiences. She also emphasized 
the importance of collecting longitudinal data over several years before implementing major changes. She 
expects assessment in the future to become increasingly technologically based. Finally, Dr. Armstrong 
believes that in the future we will have a much clearer, more standardized understanding of the 
competencies that students should be expected to develop based on their college experiences. 



East Tennessee State University 

Johnson City, Tennessee 

Interviewee: Dr. Cynthia Burnley, 

Coordinator of General Education and Performance Funding 



Institutional Background 



Established in 1911, East Tennessee State University (ETSU) is a state-supported institution 
governed by the Tennessee Board of Regents. The main campus is located in Johnson City, which is in 
the mountain and lake area of the Tri-Cities Tennessee/Virginia region. Off-campus centers include 
ETSU/UT at Kingsport, the Marshall T. Nave Center in Elizabethton, ETSU at Bristol, and ETSU at 
Greeneville. With an enrollment of approximately 12,000 students, the university offers more than 125 
degree programs, including 2-year associate degrees and bachelor’s, master’s, educational specialist, 
doctor of medicine, doctor of education, and doctor of philosophy degrees. Although the majority of 
students (58 percent of whom are female) are from Tennessee and the surrounding southeastern region, 36 
states and 37 foreign countries are represented in the student body. ETSU is also a leader in distance 
education. 



ETSU is accredited by the Southern Association of Colleges and Schools (SACS), and a 
number of degree programs are accredited by agencies in associated disciplines. Nonaccredjtable 
programs undergo an extensive academic program review every 5 years by a committee consisting of two 
external reviewers in every case. Each committee completes a standard checklist that is uniform for all 
institutions governed by ETSU’s governing board, the Board of Regents. The committee also submits an 
extensive narrative report with recommendations for improvements. Each department is then expected to 
generate a response to the recommendations, which is taken to the dean for approval and planning and 
budgetary considerations. Dr. Burnley stressed that departmental assessment is taken very seriously at 
ETSU, with many improvements in the curricula resulting directly from this process. 



Description and History of the Assessment Method 

Students seeking admission as first-time freshmen must present a minimum composite ACT 
score of 19 or must have earned a minimum high school GPA of 2.3 (on a 4.0 scale). Tennessee residents 
who graduate from public high schools must successfully complete the Tennessee Proficiency 
Examination. Assessments to determine levels of proficiency are also required for entering freshmen who 
present ACT composite, English, or math scores below 19. The Collegiate Assessment of Academic 
Proficiency (CAAP) assesses academic preparation in writing, reading comprehension, and mathematics. 
The CAAP writing sample is a 25-minute, timed essay test designed to measure student ability to use 
standard written English (organization and development of the main idea; use of vocabulary and syntax to 
express ideas clearly; and command of sentence structure, punctuation, spelling, and grammar). 

Performance funding at ETSU is based, in part, on data derived from administration of the 
College Basic Academic Subjects Examination (BASE) following completion of the general education 
curriculum and on senior assessment in the majors, with many departments using an adapted form of the 
ETS Graduate Record Exam. Departments are permitted to add locally developed items to their major 
field tests as well. The focus of this case study is on the College-BASE. Information used for funding 
decisions is also derived from an alumni survey that is sent out to former students 2 years after graduating 
and from an enrolled-student survey that is administered to a random sample of the student population. 
This survey assesses student satisfaction across many areas, including advisement, parking,. and diversity 
issues. Only responses from students who have completed 24 credit hours or more are analyzed for 





performance funding purposes. Written comments are examined systematically using a content analysis 
methodology. 

In conjunction with the general education program, a number of nonperformance funding 
assessments are conducted at ETSU. For example, a 10-item measure of oral communication proficiency 
is completed by individuals supervising students in out-of-class learning experiences, such as a practicum. 
ETSU is also developing a writing proficiency measure. The general education program is composed of 
several core areas, and faculty in each area meet regularly to conduct a nonmandated self-study of the 
curriculum. Dr. Burnley noted that the faculty recognize the advantages of convening to discuss 
objectives for student learning in the context of general education program review required for funding 
purposes, and consensus resulted in the initiation of self-study efforts. 

When performance-based funding was initially mandated, ETSU used the ACT— COMP. 
However, the decision to switch to the College-BASE was made for a number of reasons. Dr. Burnley 
noted that interpretation of the results for improvement of the general education curriculum was difficult, 
because the test focus is on application of knowledge rather than on general education knowledge. ETSU 
also experienced difficulty getting its students to take the COMP seriously because they frequently found 
the videos amusing and tended to view the assessment as somewhat of a joke. In addition, the College- 
BASE provided a much better match with the skills believed to be developed in the general education 
curriculum. ETSU decided not to use the essay component of the College-BASE because of the amount 
of time and expense involved. In general, both the faculty and the administration are more satisfied with 
the College-BASE. ETSU students take the College-BASE seriously, and motivating them to do their best 
has not been a problem. Although it does not serve as a barrier test, students are told about the connection 
between how well they do and funding for the university. Moreover, students are well aware that their 
individual reports are placed in their files for advisors to use for evaluations. Students also receive a copy 
of their test results. 



Use of the Data to Address Policy Issues 



College-BASE data are used to address various policy issues, the most salient being to 
demonstrate the efficacy of ETSU’s general education program for funding purposes and for SACS 
accreditation. In Tennessee, performance funding is awarded at a rate of 5.45 percent of the state 
appropriation for a given institution. Points can be earned if scores on the College-BASE exceed state or 
national norms. Dr. Burnley noted that since this supplementary funding program has been in effect, 
correspondence with other institutions has increased. There has been much more cross-institution 
collaboration in relation to outcomes assessment, as well as an active exchange of experiences and ideas. 
Comfort with assessment has increased among faculty at ETSU and across the state. Dr. Burnley noted 
that ETSU faculty have moved beyond dissecting every measure to a sensitive appreciation for both the 
value and limitations of assessment. The performance funding program was developed by educators 
rather than by the legislature, and Dr. Burnley believes that this has been an important factor behind the 
acceptance and support evidenced in recent years. Although no summative personnel evaluations are 
made based on assessment data, program changes and reallocation of funds have resulted in new positions 
being allocated and existing positions being phased out. 

The data generated by the initial state-mandated assessments indicated that the core 
curriculum needed to be changed. Modifications were made, resulting in a much more effective general 
education program. Dr. Burnley indicated that different forms of data reporting are generally needed for 
different stakeholders. The state provides a template for submitting assessment results that ensures 
uniformity across institutions and makes the task less cumbersome for individual colleges and 
universities. SACS is more interested in how the data are used, requiring more narrative reporting of 
information. Dr. Burnley noted that more extensive reporting (at the item level) is provided to the various 
departments. 



Implications of the Data Generated 



Dr. Burnley mentioned that ETSU has data that are not currently being used to address 
policy questions (e.g., enrollment and retention data). Assessment data that are not currently available but 
that have received attention by the general education committee include an acceptable writing assessment, 
a critical thinking measure (the Critical-Thinking Assessment Battery (CTAB) is currently being piloted 
at ETSU), and an assessment of familiarity with information technology. The general education 
committee is developing a writing competency measure in the context of the self-study groups described 
previously. As a result of having examined a number of standardized writing assessments and not finding 
a satisfactory one, ETSU’s efforts have shifted to designing a method for assessing writing skills that is 
embedded in coursework. 

Assessment data suggest that students are developing the needed skills and knowledge to 
function well in various employment contexts, to be successful in graduate training programs, and to 
grow as individuals and make worthwhile contributions to society. Moreover, stakeholders are generally 
satisfied with the return on their investment as exemplified by student competencies. Funding in recent 
years suggests this satisfaction, but Dr. Burley noted that ETSU is working diligently to improve funding 
beyond what has been achieved in recent years. 



Future Political Trends Expected to Have an Impact on Assessment 



/ 



Assessment data are currently used to prepare for the next accreditation cycle. Dr. Burnley 
mentioned that in the immediate future she sees the use of data to make positive curricular changes as 
being more routine and a part of the culture at ETSU. With regard to future assessment in the long term, 
she anticipates a much greater emphasis on course-embedded assessment that occurs throughout students’ 
careers, rather than assessment as a separate process that is introduced at the beginning or end of various 
milestones. The provost of ETSU has argued for measures other than standardized assessments, and Dr. 
Burnley mentioned that the university has explored the use of portfolio assessments, suggesting a possible 
trend toward locally developed, nontraditional assessments emerging in the future. 
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Mercer County Community College 

Trenton, New Jersey 
Interviewee: Thomas N. Wilfrid, 

Vice President for Academic and Student Affairs 



Institutional Background 

Mercer County Community College (MCCC), established in 1966, is a publicly supported 
comprehensive institution providing higher education opportunities through an open-door admission 
policy. In the fall of 1996, MCCC enrolled 2,732 full-time students (average age = 23) and 5,148 part- 
time students (average age = 31). Approximately 75 percent of the students are Mercer County residents 
(55 percent are women). 

Transfer degree (AA or AS) programs at MCCC are designed primarily to enable students to 
enter the third year of baccalaureate study at 4-year colleges. The largest student enrollments in transfer 
degree programs are in humanities and social science and in business administration. Additional transfer 
degree programs include architecture, communication and visual arts, engineering science, and plant 
science. Career degree (AAS) programs are designed to prepare graduates for entry-level employment in 
occupations that require theoretical knowledge as well as practical skills. Mercer has AAS programs in 
fields as diverse as nursing, accounting, aviation, chef apprenticeship, surveying, electronics, ornamental 
horticulture, microcomputer systems administration, television, funeral service, and computer graphics. 
With 50 percent of Mercer graduates transferring to senior colleges or universities and 75 percent 
choosing to seek employment, a number actually do both. More than 17,000 additional students are 
enrolled in continuing education programs such as computer training, small business development, health 
career certification, high school equivalency programs, English for the foreign-born, pre-college 
instruction, youth programs, and more. 

MCCC is accredited by the Commission on Higher Education of the Middle States 
Association of Colleges and Schools, and is authorized by the state of New Jersey’s Commission on 
Higher Education to confer associate degrees. Many of the college’s academic programs are also 
accredited by national professional associations and their representative boards of certification. 



Description and History of the Assessment Method 



Mercer has been using the New Jersey College Basic Skills Placement Test (NJCBSPT) in 
response to a state mandate in the early 1980s. The instrument was revised and validated throughout the 
1980s and into the 1990s. Until the early 1990s, the state required extensive reporting of NJCBSPT data. 
However, in 1994, the governor eliminated the Department of Higher Education and replaced it with a far 
less regulatory structure entitled the Commission on Higher Education. This maneuver rendered 
autonomy for the individual institutions and ended further development of the NJCBSPT. The presidents 
of higher education institutions across the state remained very invested in placement testing and quality 
service delivery and, in response to this commitment, formed the President’s Council to maintain 
communication across institutions. The council serves as a statewide task force for identifying key issues 
and establishing priorities and guidelines for higher education in the state. One subcommittee of the 
council addresses higher education assessment; Dr. Wilfrid currently serves on this subcommittee. 
Among the many assessment-related recommendations put forth by this subcommittee was one that 
strongly advised every college and university to continue to conduct basic skills placement testing. After 
examining a number of available measures, including the Accuplacer and the Compass, the subcommittee 
recommended use of the Accuplacer, primarily because its development was based largely on the 
NJCBSPT. 



O 



Dr. Wilfrid noted that at Mercer the decision to use the Accuplacer was motivated by the 
subcommittee s recommendation, as well as the match between item content and the curriculum. Mercer 
has been gradually phasing out the NJCBSPT and has been piloting the Accuplacer, with a plan to switch 
over to the Accuplacer completely by August 1998. 



Use of the Data to Address Policy Issues 

Mercer has a strong commitment to serving an urban population, and a large percentage of 
its resources are funneled into remedial education. Several state grants provide supplementary resources 
as well. For example, grant money has been used to fund a program at Mercer entitled “Project Future,” 
which provides basic education to students who demonstrate multiple remediation need areas (deficits in 
reading, writing, and mathematics) on the placement test. Approximately 10 percent of incoming students 
fall into this category (40 percent have at least one area of need), and Project Future serves an average of 
150 students per year. Dr. Wilfrid noted that the need for remediation is quite frequently very substantial, 
yet Mercer is committed to helping students develop the basic skills needed to achieve success in college. 
The process often involves much more than providing remedial courses; the students need a great deal of 
attention and encouragement. Several features of the program have been linked with success. For 
example, Project Future courses meet 2 additional hours per week, which provides more time on task in 
the classroom as well as more time in direct contact with instructors. The faculty to student ratio/ in these 
courses is 1:10. Further, Mercer also has recruited its highest caliber faculty to teach these courses, and 
several counselors work with the students enrolled in the program. Data generated from the placement 
testing, which indicated that a fairly large number of students needed a comprehensive approach to 
remediation, resulted in this positive curriculum change. 



In addition to informing curricular decisions and providing placement information and 
performance feedback to individual students, placement data are also used for accreditation. Formative 
personnel decisions are not made based on assessment data at Mercer; however, student evaluations of 
teaching are used in decisions about which adjunct faculty will be hired each semester. 

In addition to placement testing, Mercer administers a program evaluation survey to every 
other graduating class to assess student satisfaction with the educational training received at Mercer. 
Statewide data suggest that transfer students do at least as well (as reflected by grade point averages) as 
students who spend 4 years at an institution that grants bachelor’s degrees. Employer surveys indicate 
satisfaction with Mercer graduates as well. 

Data that are not presently available but that could theoretically be used to address policy 
issues include the need to measure the success of the curriculum by means other than student GPAs and 
retention rates. Mercer is looking into administering some form of standardized assessment at the end of 
the 2 years of training that would function as a post-test assessment. 



Future Political Trends Expected to Have an Impact on Assessment 



Dr. Wilfrid mentioned that performance-based funding has been discussed both in the 
legislature and by the governor, and presidents and finance officers affiliated with various higher 
education institutions are somewhat wary of the prospect. There is concern that funding decisions will be 
based on political agendas rather than on what will optimize services to students in New Jersey. Now that 
the NJCBSPT is being phased out, Dr. Wilfrid voiced some concern about continued validation of 
measures used in the future. He believes that, in the future, higher education institutions will be managed 
by individuals who make decisions based on sound data. 



Northwest Missouri State University 

Maryville, Missouri 

Interviewee: Dr. David Oehler, 

Director of Assessment and Information Analysis 



Institutional Background 



Northwest Missouri State University (NWMSU) is a state-assisted, 4-year comprehensive 
regional university founded in 1905. The university is governed by a state-appointed board of regents and 
is accredited by the North Central Association of Colleges and Schools. The university is located in 
Maryville, a rural community of 10,000 (90 miles north of Kansas City, 100 miles south of Omaha, 140 
miles southwest of Des Moines). NWMSU confers bachelor’s, master’s and specialist in education 
degrees, and also offers 2-year certificate programs. NWMSU is a moderately selective institution that 
emphasizes programs in agriculture, business, and education. The current enrollment is 6,200. Although 
the university primarily serves 19 northwest Missouri counties, students from 42 states and 22 countries 
are represented in the student body. NWMSU has been a national leader in student-based computer 
technology since 1987. The university’s ‘‘electronic campus” provides a networked personal computer in 
every residence hall room. 

I 



Description and History of the Assessment Method 

NWMSU administers a number of nationally normed, commercially produced tests. These 
include the Academic Profile, which first-semester seniors are required to take, the CAAP Critical 
Thinking Test, which is given to first-semester juniors, and various major field exams. NWMSU also 
requires students to complete a locally developed end-of-core writing assessment. This is completed at the 
culmination of the composition sequence. Students are provided with two to five current articles 4 days 
before the scheduled essay exam. The exam is timed, with students allowed two 50-minute periods to 
respond to a prompt that requires them to develop an argument citing evidence from at least two of the 
articles, along with their own experience. After composing an initial rough draft during the first 50- 
minute period, students compose a final draft during the second 50 minutes. 4 Each exam is holistically 
scored by at least two members of the English department faculty, with a third rater appointed if a 
significant discrepancy arises. The review process is blind. The majority of the students pass the exam; 
those who do not are provided an opportunity to write another essay. If the student does not pass the 
second time, the student is able to complete a third essay and submit a portfolio as a backup during the 
next semester in attendance. NWMSU is part of a statewide colloquium on writing assessment, and most 
Missouri schools are administering a similar type of exam. This colloquium has provided a forum for the 
exchange of ideas, experiences, and information across institutions. 

For initial placement testing, NWMSU uses a formula derived from ACT scores and high 
school class rank. Incoming students attend an orientation in June during which they receive their 
schedules for the upcoming fall semester. Students find out at this time if they have been placed into a 
developmental writing composition course. If students are placed into the developmental course, they are 
provided with an opportunity during orientation to test out by taking a 1-hour timed essay test, which is a 
personal essay with a prompt that changes each semester and which uses a rubric different from the end- 
of-core rubric. Several years ago, NWMSU used a composition placement test, which was very time 
consuming and burdensome to the faculty to administer and score for 1,300 incoming freshmen. Research 
into a more efficient method revealed that use of the ACT scores in conjunction with high school rank 
was as reliable a placement strategy as the essay exam, leading to the decision to use the writing sample 
only as a challenge to placement in the developmental course. 
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Performance-based funding has been in effect in the state of Missouri for the past several 
years. Although state-supported institutions were mandated to collect student outcomes data, the choice of 
the particular method was left to the discretion of the individual institutions. The Academic Profile was 
selected at NWMSU based on the match between test content and the institution-wide goals, which 
include fostering students’ communication, problem solving, critical/creative thinking, and computer and 
cultural competence. The measure was also believed to be more practically feasible to administer than 
other similar instruments. Although faculty and administrators at NWMSU are relatively satisfied with 
the Academic Profile, there is an interest in supplementing the nationally normed measure with locally 
developed, more performance-based, criterion-referenced assessment. 

In 1993, the Outstanding Schools Act (OSA) called for the development of a new, primarily 
performance-based assessment system for Missouri’s public primary and secondary schools. The focus is 
on the development of assessment methods that extend beyond measuring students’ knowledge and skills 
to assessing their abilities to apply knowledge to different real world situations. By introducing more 
performance-based assessment measures into the state’s higher education system, there will naturally be 
much more continuity between the two systems. Dr. Oehler commented that the use of frequent, 
authentic, curriculum-based assessments are needed to sufficiently monitor student progress toward target 
outcomes. He also discussed NWMSU’ s experiments with modularized instruction, which provides 
students with a variety of options in terms of course delivery. In modularized instruction, students are 
expected to achieve certain skill sets or competencies; however, they are given the flexibility to select 
modes of instruction that fit well with their individual learning preferences. The introduction of 
modularized instruction raises many new questions pertaining to the design of assessment methods that 
enable students to most optimally demonstrate the skills that they have acquired through diverse means. 



Use of the Data to Address Policy Issues 



^ P ata § enerated trough the various assessment activities at NWMSU have been used for 
unding and for accreditation purposes. Although the use of the end-of-core writing assessment data is not 
required for external decisionmaking, the data are often included in reports and have enhanced the image 
of the institution. Dr. Oehler noted that different levels of data aggregation are required for different 
internal administrators and external stakeholders. For example, assessment results provided for 
accreditation agencies and the board of regents are less detailed than what is provided to departments for 
formative purposes. * 



Assessment data have been used to extensively modify the curriculum. Each academic and 
service unit participates in a regular planning process in which they are required to identify exactly who 
they serve, delineate what their expectations are for the population served, specify how the curriculum has 
been designed to meet their expectations, and identify how the objectives will be assessed. When the data 
suggest that expectations have not been met sufficiently, modifications are introduced. 

Dr. Oehler noted that one of the most positive effects of having instituted a comprehensive 
assessment program has been in the area of faculty development. The selection and development of 
assessment methods has necessitated much more collaborative work (e.g., to design rubrics for the writing 
assessments). He has been impressed by how the faculty have become more unified and consistent in their 
t inking about measuring student outcomes. Assessment is now a part of the culture of the university, and 
Dr. Oehler has noticed that many of the faculty members are now asking much tougher assessment- 
related questions than they have in the past. For example, previously faculty may have turned to 
assessment strategies to address questions such as, “What do students know?” or “What skills are they 
able to reliably demonstrate?” Now faculty are asking questions such as, “How can we determine whether 
we are maximizing every student’s potential?” Each semester the university sponsors a quality classroom 
symposium, which provides an excellent opportunity for faculty to share their ideas and learn from their 
colleagues. Previous topics have included issues such as the use of technology in the classroom learning 
theory, and modularized instruction. ’ 



Future Political Trends Expected to Have an Impact on Assessment 



Dr. Oehler commented on how he believes that the role of higher education is changing as a 
result of technological gains and rapidly expanding means for acquiring information. He expects that 
colleges and universities will be responsible for helping students to achieve skills and learn how to 
evaluate information, rather than functioning simply as the dispensers of knowledge. He also discussed 
how faculty development should focus on providing educators with a “tool box” of instructional methods 
that can be drawn upon when ongoing, frequent assessment data indicate that changes are in order. He 
believes that part of the business of “selling assessment” to faculty lies in fostering their professional 
development in such a way that they develop an extensive repertoire. of skills for facilitating knowledge 
acquisition. 



Finally, Dr. Oehler mentioned that he believes that the different priorities of employers and 
policymakers need to be clearly communicated to academicians. However, assessment practices must be 
owned by faculty in order for the methods to be maximally effective. Therefore, faculty should be 
encouraged to be involved actively in the design and selection of assessment methods. 
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Santa Fe Community College 

Gainesville, Florida 
Interviewee: Dr. Pat Smittle, 

Director of Academic Resources and Assessment 



Institutional Background 



Santa Fe Community College (SFCC) is a comprehensive postsecondary institution located 
in Gainesville, Florida, currently serving Alachua and Bradford counties in the north-central region of the 
state. Established in 1965, SFCC provides educational opportunities to 12,600 credit students and 20,000 
noncredit students. Fifty percent of SFCC’s student body is enrolled full-time, 54 percent are female, 18 
percent are non-white, 65 percent are in the 15-24 age range, and 44 percent are from low-income 
families. In addition to being accredited to offer the associate degree by the Commission on Colleges of 
the Southern Association of Colleges and Schools (SACS), SFCC is a charter member of the League for 
Innovation in the Community College. The nationally recognized League, composed of 20 community 
college districts in 14 states and Canada, has worked diligently to stimulate innovation and 
experimentation in community college education. Specific educational offerings at SFCC include the AA 
and AS degree programs, as well as certificate programs. More AA graduates continue their studies at the 
University of Florida than at any other institution. The AS and certificate programs are in the workforce 
development division and prepare students to begin employment immediately after completing their 
degrees. Approximately 64 percent of the students are enrolled in the AA transfer degree program, with 
36 percent enrolled in the workforce development programs. 



Description and History of the Assessment Method 

With an open-door policy, SFCC provides access to all high school graduates, many of 
whom are underprepared and placed in remedial courses to develop the basic competencies needed to 
succeed in college and the workplace. Dr. Pat Smittle was initially approached to discuss the use of data 
generated with the College-Level Academic Skills Test (CLAST) to address external policy questions, 
but he preferred to discuss the use of the Accuplacer, which has been used successfully for 2 years to 
screen incoming students for remedial coursework. Although CLAST is still administered at SFCC, it has 
been phased out considerably statewide because all Florida community colleges are now required to offer 
alternatives. Two thirds of the students at SFCC have opted not to take the CLAST. 

Dr. Smittle felt that SFCC has a unique story to tell relative to its remediation program 
because it has been highly successfully in meeting the needs of diverse, traditionally underserved 
populations, particularly those of the economically disadvantaged. In addition to providing access to 
education for students from impoverished backgrounds, SFCC has developed a finely tuned remediation 
program that has resulted in both high retention and high achievement rates. Members of the community 
are particularly pleased, because many individuals who would otherwise not possess the knowledge and 
training needed to secure adequate employment are able to provide for themselves and their families 
without the aid of public assistance. 

SFCC has created a learning environment that not only accurately identifies students 
requiring remediation, but faculty and administrators have worked to achieve a development curriculum 
that accommodates different learning styles and fosters success for academically disadvantaged students. 
Moreover, SFCC has achieved these goals without compromising the integrity of its academic standards 
and without incurring exorbitant costs. Retention rates are high, and test and GPA data clearly suggest 
that students enrolled in the remediation program are achieving skill levels that are comparable to their 
peers who test out of remediation. Stakeholders, particularly taxpayers, want institutions such as SFCC to 



reach disadvantaged populations, and the achievement of SFCC in this arena is the focus of this case 
study. 



the state of Florida mandated college placement testing, leaving the choice of the 
particular ^assessment method up to the discretion of the individual institutions. At this time SFCC 
adopted the ACT paper-and-pencil test. However, in 1996, the use of ETS’s computer-adaptive placement 
test, the Accuplacer, was mandated. Accuplacer is a four-component system, developed by the College 
Board and Educational Testing Service, to provide placement, advisement, and guidance information for 
students entering 2- and 4-year higher education institutions. Accuplacer includes the Computerized 
Placement Tests (CPTs), which are used to determine which course placements are appropriate for 
college students and whether developmental studies are needed. CPTs can also be used to monitor 
students in-course progress and to suggest whether further developmental studies are needed or whether 
a change in course assignment is recommended at the end of course completion. The CPTs include the 
following eight computer-adaptive test components: reading comprehension, sentence skills, arithmetic, 
elementary algebra, college-level mathematics, and levels of English proficiency with three components 
(reading skills, sentence meaning, and language use). 



^' aC ] 1 * n ^* v '^ ua ' test consists of a small number of items (between 12 and 17 depending on 
the test) drawn from a test bank of approximately 120 items. These questions are clustered in groups 
according to their difficulty, and the first item on a specific test is drawn from a group of items of 
moderate difficulty. Subsequent items are drawn from groups of less or greater difficulty depending on 
the response to previous items. The final test score is a statistical extrapolation from the score of the (T) 
questions and is reported as a score out of (N). This score is not a percentage; due to the adaptive nature 
of the test, a percentage calculation would not be meaningful. The best way to conceptualize the score is 

t( ?.y.' ew f lt . as ^ pr f. s ® nt,I ?8. a P° sition on a scale of difficulty, with a higher CPT score indicating a greater 
ability to handle difficult items. 6 



Use of the Data to Address Policy Issues 



The remediation program, formally entitled the college preparatory program at SFCC, 
represents the primary component of the Academic Resources and Assessment department. The mission 
of the college p repar atory program is to emphasize skills, knowledge, and work habits that enable 
students with diverse backgrounds, abilities, and learning styles to continue their educational training, 
achieve in their chosen occupations, and engage in lifelong learning. The faculty and staff at SFCC are 
a lso committed to continuous evaluation and innovative revision of the educational environment in their 
efforts to maximally foster student goals. Four of the primary objectives of the college preparatory 
program are as follows: (1) to maintain and encourage an open-door policy while keeping high academic 
standards through the provision of assessment services, preparatory instructional activities, and adult 
educ atio n ; (2) i to design, implement, review, modify, and/or eliminate curricula that prepare students for 
the degree and certificate programs; (3) to foster learning of academic and work-related skills and habits 
that help students set and attain academic, career, and personal goals; and (4) to encourage and provide 
ongoing professional development for faculty. F 



.._ . The r c ? llege P re P arator y program incorporates multiple instructional methods to address 

different styles of learning, repetition of skills that build on a basic foundation, presentation of new 
material in small increments, structured activities, extensive feedback, and personalized attention. The 
comprehensive instructional model includes three components. First, large group lectures introduce skills 
and concepts (2 hours per week, taught by a full-time faculty member). Second, small group classes 
review material presented in the lecture component and help students apply it appropriately (3 hours per 
week, taught by adjunct faculty). Finally, individualized open labs provide students with additional 
opportunities to practice skills one-on-one with teaching assistants (average of 2 hours per week). SFCC 
has developed this concentrated and comprehensive program partially in response to legislative pressures 
for students to complete preparatory courses in one semester. 
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Implications of the Data Generated 



Fall 1997 Accuplacer data revealed that 56 percent of entering students required remediation 
in at least one basic skill area. However, based on recognition that no single test always reflects a 
student’s competency level, a placement validation program is in place to ensure that students enrolled in 
the preparatory course are correctly assigned. Specifically, on the first day of classes, students are 
administered a test, which is frequently an alternate form of the final exam for the course. If they pass the 
test, they are moved into higher level college preparatory courses or into college-level classes. Studies 
conducted over the past few years have indicated that very few students are inappropriately placed. For 
example, in fall 1997, only 4 percent of those enrolled in the writing preparatory course tested out and 
were moved up. Although the data suggest very few misplaced students, the faculty at SFCC have 
continued the practice, as it helps students accept their need for remediation in addition to ensuring that 
the content of the Accuplacer remains consistent with the curriculum. 

SFCC has been successful fulfilling its program mission of preparing academically 
underprepared students for college-level work and various employment contexts. Data generated to 
answer the question of “how well do college prep students perform as they move through the college- 
level program?” have been very favorable. Recent evaluation results indicate a 64 percent passing rate in 
the college preparatory course, with a 3.4 percent official withdrawal rate. Recent data have further^shown 
that preparatory students’ passing rates in subsequent courses (57 percent) met or exceeded the overall 
passing rate for students not requiring remediation (55 percent). In the English language skills courses, 
the rates were 66 percent and 57 percent for preparatory and nonpreparatory students, respectively. 

With regard to CLAST, data discrepancies between the college preparatory and 
nonpreparatory students were still evident; 63 percent of students who were enrolled in at least one 
preparatory course passed all parts of the CLAST, compared to 89 percent of those not requiring 
remediation. Students who fail the CLAST are required to remediate the skills in a CLAST lab. On the 
essay portion of the CLAST, data have been more supportive of the efficacy of the program. Specifically, 
in October 1997, 93 percent of former college preparatory students (compared to only 85 percent of the 
nonpreparatory students) passed the essay portion. Data generated in the AA transfer program indicate 
that former prep and nonpreparatory SFCC students achieve comparable GPAs in the state university 
system (both slightly under 3.00). This finding is particularly exciting because the college preparatory 
students would not have been admitted into the state university system due to their low placement scores. 

Dr. Smittle noted several of the elements that combine to create the strong developmental 
program that is now in place at SFCC. These include administrative support, structured courses, 
mandatory counseling and placement, the award of college credit for college preparatory classes, the 
implementation of varied instructional methods, the use of instructors who volunteer to teach remedial 
classes (as opposed to being assigned), peer tutors, close monitoring of student behaviors and the use of 
intervention, interfacing the program with subsequent courses, and extensive program evaluation. Other 
strengths include the following: (1) a strong research foundation, with the development and maintenance 
of the program based on the work of national leaders in the field of developmental education; (2) the 
institution of a career/academic planning (CAP) component of the program, designed to help students 
choose appropriate career/academic paths based on their interests, academic competencies, and the 
available SFCC programs; and (3) collaborative efforts with area high schools. 

SFCC administers the Accuplacer to 10 th grade students and conducts high school counselor 
workshops. The primary objective of this feature of the program is to provide feedback to students 
pertaining to their readiness for college-level work, enabling them to remediate skill deficiencies while 
still in high school. This feature of the program was initiated 5 years ago, and the idea became a part of 
state legislation in 1996-97. Since this project was initiated, the number of entering freshmen needing 
remediation studies has dropped by 12 percent. Dr. Smittle noted that additional benefits of enrollment in 
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the program are that students coming from disadvantaged environments develop excellent social skills 
and gain confidence and self-esteem in addition to developing academically. 

Considerable media attention in recent years has focused on the remediation costs in 
community colleges. Yet, the SFCC data indicate that the programs are not costly, with fewer than 3 
percent of the 1996—97 total college budget being spent on the college preparatory program (for 6,216 
seats in remedial courses and related activities). The fall 1997 Accuplacer data revealed that 56 percent of 
entering students required remediation in at least one basic skill area, and the SFCC college preparatory 
program is clearly playing a vital role in the college mission to provide access to quality postsecondary 
education for these underprepared students. 



Southeast Missouri State University 

Cape Girardeau, Missouri 

Interviewee: Dr. Dennis Holt, 

Associate Provost 



Institutional Background 



Southeast Missouri State University (SMSU) is a public institution founded in 1873 and 
located in Cape Girardeau, a community of 40,000 that serves as the major commercial and cultural 
center between St. Louis, Missouri, and Memphis, Tennessee. The university is a comprehensive state 
institution with over 150 academic programs; it offers associate’s, bachelor’s, master’s, and specialist 
degrees, along with a doctoral program in education. With an undergraduate student body of 
approximately 8,200, SMSU is primarily a regional institution and maintains a strong commitment to the 
25 surrounding counties of southeast Missouri. The North Central Association of Colleges and Schools 
accredits the university. 



Description and History of the Assessment Method 



I 

Performance-based funding in the state of Missouri requires the use of at least one norm- 
referenced test, with $100 of support awarded for each student who scores at or above the 50 th percentile. 
The first measure adopted was the ACT — COMP. However, it was discontinued based on practical 
concerns revolving around the time and cost of administration as well as reservations about the validity of 
the measure. The ACT — COMP was replaced by the short form of the Academic Profile, and this year the 
institution has decided to switch to the California Critical Thinking test after piloting the measure and 
analyzing the results. The cognitive part of the exam was administered to students in their freshman 
seminar class and to seniors in an interdisciplinary senior course, with significant differences detected 
between the two groups. The decision to adopt the California Critical Thinking test was also based largely 
on the cost; the Academic Profile was believed to be too expensive, given the limited information derived 
from the assessment. Essentially, data generated from the Academic Profile were not found to be useful 
for program improvement. Existing comparison data with norming groups suggest that student 
competencies at SMSU are comparable to those of students attending similar institutions. 



Because SMSU has now discontinued administration of the Academic Profile and the 
university’s experience with the California Critical Thinking Test has been limited, the focus of this case 
study is on SMSU’s writing proficiency exams, which have been used for more than a decade. Although 
data derived from the writing assessment program are not used specifically for performance funding or 
for accreditation purposes, the state and accreditation boards have been very pleased with SMSU’s work 
in this area. 



A 1984 policy required all students to pass a writing proficiency test after completing 75 
credit hours and prior to graduation. In 1985, state funding was secured to begin the writing outcomes 
program at SMSU, with the idea that it was to serve as a model for other institutions in the state. All 
entering freshmen take a holistically scored, timed essay exam (WP001), with the prompt requiring a 
personal-type writing sample. For example, students might be requested to describe their views on the 
nature of competition. Students are tested again as they exit the capstone English composition course 
(typically at the end of the freshman year or at the end of the first semester of the sophomore year). The 
writing proficiency exam at this point (WP002) involves a two-part, timed essay test. There is a 
referential or source-based analytic prompt, which requires the students to read a number of excerpts and 
then take a position on an issue, supporting their viewpoint with correct referencing of information from 
the excerpts. The second segment is a personal essay similar in form to the one used with entering 
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freshmen. Finally, the third writing proficiency exam (WP003) is administered upon completion of 75 
credit hours, with the format being identical to that of the WP002. Unlike the WP003, the WP002 is not a 
barrier exam but functions as a warning to students who may need enrichment experiences prior to taking 
the last exam. 



Students who perform marginally or fail the WP002 receive a letter inviting them to visit the 
wnting center to receive feedback on the exam. Additional help with writing is also made available as 
needed. Scores on the WP002 exam account for 5 percent of the students’ grades in the capstone course 
Students must demonstrate competency on the WP003 test or, in the event that they fail, on an approved 
portfoho option in order to graduate. Longitudinal studies conducted with data generated from the writing 
proficiency exam administered at different points in SMSU students’ college careers indicate relatively 
high scores on the analytic essay segment at the end of the capstone course and modest, statistically 
significant gains between the WP002 and WP003 administrations. 

Rubrics have recently been developed for critical thinking, reasoning, and analysis (similar 
to the rubrics used on the GMAT and the ETS Tasks in Critical Thinking), enabling the essay exams to 
serve the dual purpose of measuring writing proficiency and critical thinking skills. Significant 
correlations were observed between scores on this locally developed assessment and the data derived 
from the piloting of the California Critical Thinking Test. Dr. Holt noted that SMSU is excited about 
validating its criterion-referenced measure with scores derived from a nationally normed test. 

SMSU also administers graduate follow-up surveys and enrolled-student surveys that request 
students to report the degree to which they believe their coursework has enhanced their critical thinking 
and writing skills. Student self-report data have been favorable. 



Use of the Data to Address Policy Issues 



SMSU staff believe that their efforts in outcomes assessment generally and in the domain of 
writing assessment specifically have been ambitious, successful, and highly visible, resulting in positive 
effects on the reputation of the institution both at the state and national level. SMSU has been sensitive to 
the skills deemed essential for college students by external stakeholders. For example, first the university 
addressed assessment of writing competency in a systematic and comprehensive manner, and now it is 
concentrating its efforts on closely examining assessment of critical thinking competencies. Another 
direction that exemplifies SMSU’s awareness of current political issues pertains to its recent efforts 
directed toward conducting controlled studies of the use of technology in the classroom. Dr. Holt noted 
that three recent proposals for conducting such experiments have received state funds. 



Future Political Trends Expected to Have an Impact on Assessment 

When Dr. Holt was asked about future developments likely to have an impact on assessment, 
he mentioned a statewide cooperative project that administrators representing 2- and 4-year institutions 
throughout the state are currently working on. The project focus is on the development of core 
educational objectives and the identification of common assessment methods to address the issue of 
controlling the quality of students transferring from community colleges to institutions granting 
bachelor’s degrees. & 

his advice for policymakers regarding the assessment of critical thinking and writing, Dr. 
Holt had a word of caution for presidents of institutions and coordinating boards regarding the 
overinterpretation of test scores. He voiced some concern that overzealous efforts directed toward efforts 
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to demonstrate student achievement may lead to higher education officials losing sight of the limitations 
of the methods from which the data are derived. 



I 
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Tennessee State University 

Nashville, Tennessee 
Interviewee: Dr. Dennis Gendron, 
Associate Vice President for Academic Affairs 



Institutional Background 



Tennessee State University (TSU) is a major state-supported, urban, land-grant, and 
comprehensive university governed by the Tennessee Board of Regents. TSU provides instructional 
programs and statewide cooperative extension services and conducts agricultural research. As a 
comprehensive institution, TSU provides programming in agriculture, allied health, arts and sciences, 
business, education, engineering and technology, home economics, human services, nursing, and public 
administration. The institution is comprehensive at the bachelor’s and master’s levels; however, doctoral 
programs are only available in the education and public administration areas. As an urban institution 
located in the capital city, TSU provides both degree and nondegree programs (day, evening, weekend, 
and at off-campus sites) that are appropriate and accessible to a working population. Moreover, TSU 
serves a diverse population of students — traditional, nontraditional, commuter, residential, undergraduate, 
graduate, nondegree, full-time, and part-time. Fall 1997 enrollment data indicate that 71 percent of the 
TSU student population is black, 25 percent is white, and 4 percent are of other races. Further, 65 percent 
of the students are enrolled full time, and 35 percent attend part time. / 



Description and History of the Assessment Method 



Dr. Gendron indicated that the ACT — COMP has been used to address policy questions for 
the past 10 years, beginning when the use of the measure was mandated by the state. COMP data are used 
to assess the efficacy of the core curriculum as exemplified by the basic skills demonstrated by graduating 
seniors (essentially an exit test). Four years ago, the state allowed institutions to substitute the COMP 
with another measure if they so desired. Several schools switched over to the College-BASE. The 
decision to continue with the COMP was made at TSU largely based on its interest in conducting 
longitudinal studies of program effectiveness. TSU has recently adopted the ACT Critical Thinking 
measure (Critical Thinking Assessment Battery, CTAB). Use of a critical thinking test was not mandated 
by the state; however, TSU is working to develop critical thinking across the curriculum and selected 
CTAB as its assessment method in efforts to modify the curriculum and to develop new teaching methods 
that facilitate critical thinking in different content areas. The state has been supportive of TSU’s efforts, 
providing financial incentives for the development of new curricula, including funding to support faculty 
leave to attend critical thinking workshops. The faculty who attend training sessions subsequently work 
with their colleagues to share their knowledge. Because the CTAB was instituted only a year ago, the 
focus of this report is on the COMP. 



Use of the Data to Address Policy Issues 



Dr. Gendron indicated that test data generated by the COMP are currently used to address a 
number of policy issues. In particular, state funding is based on the six skill areas of the COMP for 
accreditation purposes (SACS); to develop and maintain institutional effectiveness standards; and to 
promote the reputation of TSU at the local, state, and national levels. Faculty and administrators at TSU 
are generally very satisfied with the COMP. Data generated from the COMP have been used to provide 
diagnostic feedback to students (they are provided with scores for the different areas), for advancement of 
individual students, and to improve and restructure the curriculum, in addition to being used to augment 



and reallocate financial resources and for accreditation purposes. Although no summative personnel 
evaluations (e.g., promotion and tenure decisions) are made based on the data generated, test scores are 
used for faculty development purposes. Dr. Gendron indicated that different forms of data reporting are 
generally needed to answer the questions of different stakeholders. For example, U.S. News and World 
Report requires extensive reporting, whereas other agencies are content with data summaries (available 
over the Internet). All of the data collected at TSU are used, and no attention has been devoted to deriving 
assessment data to answer policy questions by any other means than with the use of traditional forms of 
assessment. 



In terms of data that are not available, TSU is currently lacking a rising sophomore test. TSU 
has felt the need to assess student competencies upon completion of the core curriculum and prior to 
entering the majors. The university plans to initiate use of the Academic Profile in the near future. 
Because norm-referenced results have not conveyed enough information, TSU is eager to implement the 
criterion-referenced Academic Profile. Dr. Gendron indicated that no plans for developing tests locally to 
generate data needed to address policy questions exist; however, TSU will consider a locally developed 
measure of critical thinking if the CTAB turns out not to meet its needs. TSU personnel have been 
working with individuals affiliated with East Tennessee State University, Tennessee Tech, and Middle 
Tennessee State University in the piloting of the CTAB. If a change is made, it will be made in 
cooperation with the representatives of these other institutions. 

Dr. Gendron responded positively to the question about the degree to which TSU students 
are developing the skills and knowledge necessary to function well in various employment contexts. He 
noted in particular that employers are very satisfied with the values and social skills of TSU graduates. 
Student competency in interpersonal or social contexts is supported by the Functioning in Social 
Institutions COMP subscale data. Although generally satisfied, employer surveys have suggested the need 
for more preparation in the areas of critical thinking, writing, and technology. COMP data suggest that the 
areas where students perform the lowest are in the arts and humanities, but these areas have not been of 
serious concern to the majority of employers. Alumni surveys further indicate that students are satisfied 
with their education and feel well prepared for various work settings. Most of the students attending TSU 
represent the first generation in their families to attend college, and a large percentage are from 
economically disadvantaged backgrounds, necessitating high levels of dependence on student loans. As a 
result, most TSU graduates feel compelled to work immediately after graduating in order to repay loans. 
The majority also tend to become rapidly established in their careers and are generally not interested in 
attending graduate school. Students who pursue graduate studies are self-selected, highly competent, and 
therefore very successful. 

With regard to data generated to examine the relative efficacy of different teaching methods, 
Dr. Gendron noted that although TSU is moving in the direction of more Internet-based instruction, 
controlled studies comparing student outcomes in technologically delivered versus traditionally delivered 
classroom formats have been limited. Studies comparing student satisfaction and academic performance 
in distance education courses versus traditional classroom settings have revealed lower satisfaction and 
performance with the distance learning format. In general, the students are dissatisfied with the lack of 
personal attention associated with distance learning, and presumably this dissatisfaction negatively affects 
performance. Although some instructors have attempted to compensate by traveling to different sites, this 
strategy is construed as defeating the purpose of distance learning and has been seen as an extra burden by 
faculty. 



With reference to logistical problems encountered in the administration of the COMP and 
other standardized tests, Dr. Gendron noted that a primary problem is with the listening segments of the 
tests. Many TSU students have poor listening skills, and when they have to sit still and concentrate on a 
passage that is delivered in a standardized, often monotone style, the students frequently lose their 
concentration. Discussions about resolving this problem have focused on the use of headphones, based on 
the assumption that more direct delivery would reduce distractibility. Dr. Gendron commented on the 
constant and varied stimulation that this generation of students has grown up with and noted how difficult 
it is to capture and maintain the students’ attention for any length of time. 
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Future Political Trends Expected to Have an Impact on Assessment 



When asked about advice for policymakers regarding the assessment of critical thinking or 
writing, Dr. Gendron commented that learning by rote is no longer useful in our rapidly changing and 
technologically advanced society. He believes that new methods designed to teach critical thinking skills 
such as synthesis and evaluation that go beyond analysis skills are greatly needed. He further noted that 
students must learn to quickly assimilate and discriminate information. From his perspective, students 
must be able to change their point of view for different audiences. Students need to be highly skilled users 
of the Internet, graphical programs, and presentation software, such as PowerPoint, in addition to being 
skilled writers, with the use of e-mail becoming so prevalent. Dr. Gendron felt it was impossible, from his 
vantage point anyway, to try to predict what assessment will be like in the year 2020, given the changes 
that have transpired over the past 2 decades. 



/ 
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Washington State University 

Pullman, Washington 

Interviewee: Dr. Bill Condin, 
Writing Program Director 



Institutional Background 



Washington State University (WSU) is a land-grant university founded in Pullman in 1890. 
The university became a multicampus system in 1989 with the establishment of campuses in Spokane, the 
Tri-Cities, and Vancouver. Approximately 17,000 students (15,000 undergraduate and 2,000 graduate) 
are enrolled at WSU, with the majority on the Pullman campus (14,100). The branch campuses primarily 
serve students who are geographically restricted and would otherwise have limited educational 
opportunities. Enrollment is expected to double by the beginning of the next century as facilities and 
degree offerings are expanded. The university is composed of eight colleges, a graduate school, and the 
Intercollegiate Center for Nursing Education. WSU is accredited by the Commission on Colleges of the 
Northwest Association of Schools and Colleges, and many departments and colleges are accredited by 
professional accrediting associations recognized by the Council on Postsecondary Accreditation. The 
institution is also a member of the National University Continuing Education Association. 

Liberal arts and sciences have always been strongly emphasized in the curriculum, together 
with business, education, architecture, pharmacy, nursing, and the traditional land-grant programs in 
agriculture, engineering, home economics, and veterinary medicine. There are nearly 100 major fields of 
study, with bachelor’s degrees offered in all areas and master’s and doctoral degrees available in the 
majority of fields. WSU has developed an extensive writing program that is nationally recognized for its 
innovation, scope, and effectiveness. 



Description and History of the Assessment Method 



The focus of this case study is on assessment of student writing competencies in the context 
of the WSU writing program, which has successfully incorporated writing throughout the curriculum 
(both across all disciplines and throughout the 4 years of undergraduate training). 

The WSU writing program incorporates extensive, challenging writing experiences with a 
program of writing assessment that facilitates identification of students who need help with writing at 
various points in their college careers, while recognizing students with outstanding writing skills. The key 
features of the writing program at WSU include the following: (1) a writing placement exam; (2) a solid 
foundation in college-level writing in introductory composition courses that are tailored to different 
beginning competency levels; (3) a general education or honors program curriculum with a substantial 
amount of writing embedded throughout the coursework; (4) a junior-level diagnostic assessment of 
writing, referred to as the university writing portfolio and incorporating both a portfolio component and a 
two-part timed essay; and (5) two writing intensive courses in which students learn the forms of writing 
that are used in their chosen major fields. 

The writing placement exam requires students to write two essays that are specifically 
designed to match the writing assignments encountered in the beginning English composition courses. 
The 2-hour timed exam begins with a passage of reading material and requires students to respond to the 
excerpt using college-level intellectual strategies (summarize, compare and synthesize different 
viewpoints, solve problems, etc.). One essay is an argument or analysis, and the other is essentially a 
reflection, requiring students to refer back to what they wrote for the first essay. The exams are diagnosed 
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by experienced English faculty, with the evaluation criteria focusing on the development of a main point, 
organization, persuasion, and evidence of having been proofread. 

Initial English writing coursework is designed to meet the needs of students who vary quite 
dramatically in terms of their readiness for the challenges inherent in academic writing, from requiring 
additional assistance in discrete areas of composition (focus, organization, support, style, mechanics, etc.) 
to readiness for the accelerated honor’s course. An introduction to academic writing for nonnative 
speakers of English is offered as well. Most first-year students enroll in a version of the English 101 
course, which is considered the cornerstone of general education at WSU. The focus of this composition 
course is on aiding students in the transition to analysis, inquiry, and argument from the content writing 
that is emphasized in high school. Subsequent general education courses provide additional opportunities 
to build on writing competencies fostered in the foundation courses. Writing-intensive assignments in the 
majors are reviewed, critiqued, and revised for grading and assume various forms: research, synthesis, 
argument papers, proposals, laboratory and technical reports, memoranda, and progress notes. Dr. Condin 
noted that one goal of the WSU writing program has been to have students write at least 100 pages during 
their college careers. 

Prior to finishing 61 credit hours, students submit a writing portfolio that includes three 
papers from courses taken at WSU and two timed essays. The portfolio is a mid-career assessment of 
writing skills (following the lower division general education courses and preceding upper division 
coursework in the major). The course papers must be signed off by the teacher of the course as 
acceptable or outstanding and may be library or laboratory research papers, reviews or critiques, 
technical reports, proposals, essays, case studies, fictional stories, or student self-evaluati6ns. The 
examination component includes a 90-minute argument-type essay based on a short passage of prose and 
a 30-minute self-evaluation piece. This format is similar to the writing placement examination and 
enables longitudinal study of student writing competency. Portfolios are read by trained university faculty 
representing virtually all academic disciplines and are judged “pass,” “pass with distinction ” or “needs 
work.” 



Although the portfolio is designed as a diagnostic tool to facilitate the provision of support to 
writers needing additional help as they advance into their major courses and as recognition for exemplary 
writers, it is also a graduation requirement. Students must receive at least a “pass” on the university 
writing portfolio to graduate. Students who do not pass (approximately 10 percent each year) must take 
general education 302, which is a one-credit writing group that emphasizes revision, feedback, self- 
assessment, and collaboration. The university writing portfolio serves as a diagnostic aid to ensure that all 
students have enough support to respond successfully to the writing experiences presented in the major. 
The portfolio is also designed to commend the top 10 percent of students, who receive the designation 
“pass with distinction” on their transcripts. Beginning in the spring of 1996, students submitting the five 
best portfolios were each awarded a cash prize of $100. 

The WSU portfolio scoring system was thoughtfully conceived and makes effective use of 
faculty time and energy through the use of a two-tier rating system. In the first tier, an initial group of 
faculty assigns ratings of “needs work,” “pass,” or “pass with distinction.” For portfolios receiving a 
pass, this is the end of the assessment process. However, the portfolios in the bottom and top categories 
are assessed by a second group of raters prior to officially awarding the “needs work” or “pass with 
distinction” designations; this represents the second tier of the process. The system allows for more 
faculty time to be spent with the less typical portfolios, facilitating finer discriminations. 



Use of the Data to Address Policy Issues 



In the late 1980s, the state of Washington mandated entry, mid-career, and end-of-program 
assessment of student academic competence, although the actual form of the assessment was left to the 
discretion of the various institutions. The university writing portfolio was approved in the spring of 1989 



by the WSU faculty senate and became effective for students entering WSU in fall of 1991. The first 
assessment occurred in spring 1993, and currently more than 3,000 students complete the examination 
annually. The purpose of the initial and continued use of assessment data derived from the portfolio 
assessment was to acquire information needed to fortify the curriculum in order to effectively foster 
student writing skills. Although the portfolio assessment uses a holistic scoring approach, students 
requesting diagnostic feedback are provided the opportunity to have a conference with the faculty raters 
to clarify problem areas. 

Every 2 years, a comprehensive self-study is conducted. The results have been very positive, 
suggesting substantial gains in student writing proficiency based on curricular experiences. The data 
derived have also been invaluable in generating educational assessment data needed for accreditation. 
Alumni survey data have further illustrated an increase in student satisfaction pertaining to the 
development of their writing skills while in attendance at WSU. Specifically, in the late 1980s, alumni 
generally expressed low levels of satisfaction with the WSU undergraduate writing skills training they 
had received, whereas recent alumni survey data have conveyed high levels of satisfaction pertaining to 
educational training in writing. Stakeholders such as the Higher Education Coordinating Board, 
taxpayers, employers, and graduate program personnel have been very satisfied with the writing abilities 
of WSU graduates. 



Implications of the Data Generated 



/ 



The data have suggested that changes may need to be implemented to meet more effectively 
the writing needs of nonnative speakers. Further, Dr. Condin noted the need for data related to the degree 
to which the program is effectively serving other factions of the student population, such as rural 
residents, transfer students, and economically disadvantaged individuals. WSU is developing a scoring 
rubric to assess critical thinking ability based on student responses to the timed essay portion of the 
portfolio and the placement test. The development of this rubric, which is in the final stages of pilot 
testing, is in response to the recent emphasis of various stakeholders on critical thinking skills. 



Amazingly, the entire writing program runs on an annual budget of only $80,000, primarily 
because students are required to pay for each assessment ($9 and $12 for the placement and portfolios, 
respectively). Faculty involvement in the scoring of the assessments is voluntary, and faculty are paid by 
the hour. With only a half day of training required and involvement construed as service to the university, 
WSU has not experienced any difficulty recruiting interested faculty. Those who are most actively 
involved each year also receive letters acknowledging the time they have devoted to the program. 



In terms of logistical problems, Dr. Condin noted that WSU has experienced some difficulty 
keeping the portfolio as a mid-career assessment, with approximately 25 percent of the students putting it 
off until their senior years. As a result, the assessment ends up functioning as a barrier test for some 
students rather than as the mid-career diagnostic that it was designed to be. In an attempt to rectify this 
problem, WSU is planning programs to educate the students regarding the benefits of completing the 
portfolio at the most appropriate time. 



The success of the writing program, as reflected by student achievement, faculty investment 
and support, practical feasibility, and innovative features such as the use of an online writing lab, is truly 
commendable. Both the writing program and the assessment methods provide a useful and realistic model 
for other institutions considering implementing a program in which extensive coursework in writing is 
tied very closely to the assessment of student competency. 



Future Political Trends Expected to Have an Impact on Assessment 



Dr. Condin expects stronger external demands on assessment in the future. He believes that 
the writing assessments currently in place should more than satisfy the need for student writing 
competency data. He further anticipates that WSU will be required to invest energy in documenting 
student learning in other areas. In terms of advice for policymakers regarding assessment of writing, Dr. 
Condin recommends greater emphasis on performance-based assessment using actual curricular products, 
along with involvement of a broad group of faculty members holding various disciplinary affiliations. 
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APPENDIX A 



Case Study Questions 

National Postsecondary Education Cooperative 
Student Outcomes Pilot Working Group: 
Cognitive and Intellectual Development 



Assessment Method: 

Name of Institution: 

Name of Interviewee: 

Title of Interviewee: 

Policy Questions: 

1. What assessment data are actually being used to answer policy questions? 

2. Was the assessment mandated? By whom? j 

3. If the assessment was mandated, was the use of this particular assessment method mandated? 

4. If the particular assessment method was not mandated, what criteria were used to select the 
assessment method? 

Match with content knowledge represented by the current curriculum? 

Match with special cognitive skills? 

Match with skills/knowledge believed to be prerequisite for entering the work world after 
graduation? 

Other selection criteria? 



Please specify. 



5. What were the initial intended uses of the test data? 



6. What policy questions were initially intended to be addressed by the data derived? 



7. How were or are the data being used? (Has the institution found the assessment method useful?) 



To provide diagnostic feedback to individual students? 



For advancement of individual students? 



To improve, restructure existing programs (e.g., result in new course offerings)? 



To augment, reallocate resources? 
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For accreditation purposes? 

Please specify 

For external constituents such as state boards? 

For summative personnel evaluation purposes (e.g., promotion and tenure decisions)? 

Other uses? 

Please specify 

8. How can the assessment results affect the institution (positively and negatively)? 

9. Are different forms of data reporting generally needed to answer the questions of different 
stakeholders? 

10. What data exist that are not used, but could theoretically be applied to answer policy questions? 

1 1. What data do not exist at your institution, but are needed to answer policy questions? 

12. Are there appropriate existing measures to generate the needed data, or are there plans to develop 
tests locally to address policy questions that are currently unanswerable given the existing testing 
program? 

13. Has any attention been devoted to deriving assessment data to answer policy questions by any means 
other than with the use of traditional forms of assessment? 

14. Are there policy questions being answered that have not actually been asked? If so, what are they? 

15. Do the data suggest that students are developing the needed skills and knowledge necessary to 
function well in various employment contexts? What are these skills? 

16. Do the data suggest that students are developing the needed skills and knowledge to be successful in 
graduate school? What are these skills? 

17. Do the data suggest that your students are developing the skills and knowledge needed to fit well into 
society and to make meaningful contributions? What are the skills that suggest high social 
adaptability? 

18. Have data been generated to examine the relative efficacy of different teaching methods (e.g., 
technologically-based versus traditional instruction) in the fostering of skills deemed important by 
stakeholders? 

19. Are stakeholders generally satisfied with the return on their investment, as exemplified by the impact 
of educational experiences at your institution on students’ intellectual and personal growth? In not, 
what are the areas of discontent? 

20. What advice do you have for policymakers regarding assessment of critical thinking (or writing)? 

21. What future developments might have an influence on assessment at your institution? 




22. Do you see any immediate developments? 

23. If we did an assessment in the year 2020, what might it “look” like? 

Operations Questions: 

1. What was the cost of the test? 

2. Were there any special features involved in the assessment procedure (e.g., addition of local questions 
to a commercial test, student incentives, etc.)? 

3. What were the defining demographic characteristics of the student population? 

4. How was the sample derived (number, percent of the full population, and method — random, 
stratified, etc.)? 

5. When, where, and how was the test administered? 

6. How frequently were the students administered the test? 

7. What logistical problems, if any, occurred in the testing process? ■ 



APPENDIX B 



NPEC Case Study Categories 



I. Institutional Background 

Location 

Size, type of institution 
Student population served 
Programs offered 
Accreditation 

II. Description of the Assessment Method 

III. History of the Assessment Method 



Mandated by the state vs. selection by the institution 

Time frame, use of other measures prior to existing, and reasons for changes 
Satisfaction with the current measure for generating needed data, plans for changes 

IV. Use of the Data to Address Policy Issues 



/ 



Description of the most relevant policy questions 
How the data are currently being used 
Secure, reallocate funds 
Accreditation 

Students (placement, diagnostic feedback, advancement, graduation) 
Improve, restructure programs 
How the data are likely to be used in the future 



V. Implications of the Data Generated 

Development of student competencies (employment, academic, and personal competencies) 
Need for different forms of data, new/innovative methods 

VI. Future Political Trends Expected to Have an Impact on Assessment 

Immediate 

Long-range 
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APPENDIX C 



Definitions of Critical Thinking, Problem Solving, and Writing 



Critical Thinking: Critical thinking is defined in seven major categories: interpretation, analysis, 
evaluation, inference, presenting arguments, reflection, and dispositions. Within each of these categories 
are skills and subskills that concretely define critical thinking. No single test measures every aspect of 
critical thinking; in fact, even with all of the tests combined, all critical skills are not assessed. Although a 
single comprehensive test is not available, many tests are still adequate measures of some critical thinking 
skills. 



Problem Solving: Problem solving is defined as understanding the problem, being able to obtain 
background knowledge, generating possible solutions, identifying and evaluating constraints, choosing a 
solution, functioning within a problem-solving group, evaluating the process, and exhibiting problem 
solving dispositions. There is not an adequate measure of problem-solving skills, and the most 
comprehensive measure is the ETS Tasks in Critical Thinking. 

Note: There is considerable overlap in critical thinking and problem solving. For 
instance, the ability to state a problem; evaluate factors surrounding the problem; 
create, implement, and adjust solutions as needed; analyze the process and fit of a / 
solution; as well as having an active inclination towards thinking, solving problems, 
and being creative are all skills necessary for both problem solving and critical 
thinking. Therefore, clear distinctions between problem solving and critical thinking 
may prove difficult to assess and tease apart in application. 



Writing: Attempts to define writing often focus on the products (essays, formal reports, letters, 
scripts for speeches, step-by-step instructions, etc.) or the content of what has been conveyed to whom. 
When writing is defined only as a product, elaboration of the construct tends to entail specification of 
whether particular elements, such as proper grammar, variety in sentence structure, organization, etc., are 
present (suggestive of higher quality writing) or absent (indicative of lower quality writing). Attention is 
given to describing exactly what is generated and detailing the skill proficiencies needed to produce a 
given end-product. Although educators, researchers, and theorists in the writing field tend to prefer a 
process-oriented conceptualization of writing, research suggests that employers in industry are more 
interested in defining writing competence with reference to products (Jones et al. 1995). 

A recent report on national assessment of college student learning (Jones et al. 1995) provided a 
comprehensive definition of writing that, in addition to including several subcomponents of the process, 
delineates critical aspects of written products. The general categories of key elements composing the 
construct of writing produced by these authors include awareness and knowledge of audience, purpose of 
writing, prewriting activities, organizing, drafting, collaborating, revising, features of written products, 
and types of written products. These researchers developed this definition based on an extensive review of 
relevant literature and feedback from a large sample of college and university faculty members, 
employers, and policymakers representative of all geographic regions in the U.S. Stakeholders were asked 
to rate the importance of achieving competency on numerous writing skills upon completion of a college 
education. Jones et al. (1995) found that in every area of writing there were certain skills that each 
respondent group believed were essential for college graduates to master in order to facilitate effective 
functioning as employees and citizens. However, there were areas of contention as well. For example, 
employers and policymakers placed less emphasis on the importance of the revision process, tending to 
expect their graduates to be able to produce high-quality documents on the first attempt. In addition, 
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employers found the ability to use visual aids, tables, and graphics as more important than faculty 
members, and faculty members attached more importance to being able to write abstracts and evaluations. 
The resulting definition produced by Jones et al., which only includes skills that were universally 
endorsed by all three groups, is based on a consensus derived empirically from groups that possess very 
different interests regarding the development of writing skill competency through undergraduate training. 
This definition is used in the sourcebook for examining writing assessments. 

Source: U.S. Department of Education, National Center for Education Statistics, The NPEC 
Sourcebook on Assessment, Volume 1: Definitions and Assessment Methods for Critical Thinking, 
Problem Solving, and Writing, NCES, 2000, prepared by T. Dary Erwin for the Council of the National 
Postsecondary Education Cooperative, Student Outcomes Pilot Working Group: Cognitive and 
Intellectual Development. Washington, DC: U.S. Government Printing Office, 2000. 



APPENDIX D 



Assessment Methods Reviewed for Sourcebook 
Assessment Methods for Critical Thinking and Problem Solving 



Acronym Test Name 



A. PROFILE 


Academic Profile 


CAAP 


Collegiate Assessment of Academic Proficiency 


CCTDI 


California Critical Thinking Dispositions Inventory 


CTAB 


CAAP Critical Thinking Assessment Battery 


CCTST 


California Critical Thinking Skills Test 


CCTT 


Cornell Critical Thinking Test 


COMP 


College Outcomes Measures Program — Objective Test 


ETS TASKS 


ETS Tasks in Critical Thinking 


MID 


Measure of Intellectual Development 


PSI 


Problem Solving Inventory 


RJI 


Reflective Judgement Inventory 


WGCTA 


Watson Glaser Critical Thinking Appraisal 



Assessment Methods for Writing 



Acronym Test Name 



CLEP 


College-Level Examination Program 


SAT-II 


Scholastic Aptitude Test 


AP 


Advanced Placement 


CAAP 


Collegiate Assessment of Academic Proficiency 


COMPASS 


Computerized Adaptive Placement Assessment and Support System 


TASP 


Texas Academic Skills Program 


CLAST 


College-Level Academic Skills Test 


SEEW 


Scale for Evaluating Expository Writing 


IIEP 


Illinois Inventory of Educational Progress 


NJCBSPT 


New Jersey College Basic Skills Placement Test 


COMP 


College Outcome Measures Program 


MCAT 


Medical College Admission test 


TWE 


Test of Written English 


GMAT 


Graduate Management Test 



The Academic Profile (1989) 

Long Form: 144 items 
Short Form: 36 items 

Publisher: Educational Testing Service 

Critical Thinking Component: The Academic Profile’s critical thinking component contains seven 
subscores that include questions in the following areas: humanities, social sciences, and natural sciences. 
Humanities questions require the student to recognize cogent interpretation of a poem, distinguish 
between rhetoric and argumentation, draw reasonable conclusions, and recognize elements of a 
humanities selection that strengthen or weaken the argument presented. Social science questions require 
the student to recognize assumptions made in a piece of social science writing, recognize the best 
hypothesis to account for information presented in a social science passage, and recognize information 
that strengthens or weakens arguments in made in such a passage. Natural science questions require the 
student to recognize the best hypothesis to explain scientific phenomena, interpret relationships between 
variables in a passage, draw valid conclusions based on passage statements, and recognize information 
that strengthens or weakens arguments in the passage. 

Writing Component: The optional, content-related essay is designed to assist institutions with their 
general education outcome assessment. Students are required to apply concepts to material read or studied 
in related to course work. The focus is on generating an analytic essay, integrating appropriate examples 
from coursework. / 



California Critical Thinking Skills Test, Forms A & B (1990-1992) 
34 multiple-choice items 

Publisher: California Academic Press 



Critical Thinking Component: The CCTST provides a total critical thinking score, and also provides 
seven subscores that measure truth-seeking, open-mindedness, analytically, systematically, confidence, 
inquisitiveness, and cognitive maturity. Truth-seeking is defined as being eager for knowledge and having 
courage to ask questions, even if knowledge fails to support or undermines preconceptions, beliefs, or self 
interests. Open-mindedness is defined by tolerance for different views and self-monitoring for bias. 
Analytically is defined as prizing application of reason/evidence, alertness to problematic situations, and 
anticipating consequences. Systematically is defined as being organized, orderly, focused, and diligent in 
inquiry. Confidence is defined by trusting one’s own reasoning process. Inquisitiveness is defined as 
curious/eager to acquire knowledge, even if applications are not immediate. And cognitive maturity is 
defined by prudence in making, suspending, or revising judgment, and awareness of multiple solutions. 

College Assessment of Academic Proficiency (1988) 

32 multiple-choice items 

Essay component with 72-item multiple-choice segment 
Publisher: American College Testing Program 



Critical Thinking Component: The CAAP CTT measures the ability to clarify, analyze, evaluate, and 
extend arguments. Subscores also measure analysis of the elements of the argument; evaluation of the 
argument; and extension of an argument 
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Writing Component: The CAAP writing component measures writing skills that are considered 
foundational for performance in upper-level college courses. Students are required to read a passage, and 
are then given a specific context in which to write an essay that argues a particular point. The knowledge 
required for this measure is consonant with the training and experience of college-level sophomores. 



College Basic Academic Subjects Examination (1989-1990) 

Essay 

Publisher: The Riverside Publishing Company 

Writing Component: The College BASE is used to assess competencies usually achieved through a 
general education curriculum. It is typically administered at the end of the sophomore year, but can be 
used at different times to assess change as a result of college experience. The College BASE is useful for 
diagnosing strengths and weaknesses of individual students and curricula. It is not designed for student 
selection into particular programs. 



College-Level Academic Skills Test (1984) 

Narrative/persuasive essay 

(multiple choice available) ' 

Publisher: Florida State Department of Education 

Writing Component: The CLAST is used for advancement to upper division courses and requires that 
students compose a persuasive essay. Essays are scored based on specifying a clear purpose; presenting a 
clear thesis; outlining an organized plan; presenting well-developed supporting paragraphs; providing 
specific, relevant details; using a variety of effective sentence patterns; making logical transitions; 
displaying effective word choice; and using correct, standard-English. 



College Outcome Measures Program Objective Test (1976) 

60 multiple-choice items 
Writing skills assessment 

Publisher: American College Testing Program 

Critical Thinking Component: The COMP Objective Test provides a total critical thinking score and 
subscores for communicating, solving problems, clarifying values, functioning within social institutions, 
using science and technology, and using the arts. Communicating involves sending and receiving 
information in a variety of modes, within a variety of settings, and for a variety of purposes. Solving 
problems requires analyzing a variety of problems, selecting or creating solutions, and implementing 
solutions. Clarifying values involves identifying one’s personal values and the values of others, 
understanding how personal values develop, and analyzing implications of decisions made on personally 
held values. Functioning within social situations involves identifying, analyzing, and understanding social 
institutions and their impact on one’s self and others. Using science and technology requires identifying, 
analyzing, and understanding technology and its impact on one’s self and others. Using the arts involves 
identifying, analyzing, and understanding art and its impact on one’s self and others. 

Writing Component: The COMP Writing Skills Assessment measures knowledge and skills acquired as 
a result of general education programs and that are important to effective adult functioning. This measure 





assists in program evaluation, but was not developed for making judgments about individual students. 
The COMP Writing Skills Assessment emphasizes practical application, rather than an academic focus. 
Students are required to write a personal letter to a U.S. senator and to a radio station. Content areas of 
social science, technology, and fine arts are covered in the three essays. 



Critical Thinking Assessment Battery (1997) 

32 multiple-choice items 

3 essays and 15 double multiple-choice questions 
15 ranked sets of questions 

Publisher: American College Testing Program 

Critical Component: The CTAB critical thinking component assesses skills in clarifying, analyzing, 
evaluating, and extending arguments. The applied reasoning component assesses skills in analyzing 
problems, generating logical and reasonable approaches to solve and implement solutions, and reflecting 
consistent value orientations. The engagement in reasoning and communicating component inventories 
past involvement in community/social contexts that require the application of problem solving and 
communication skills. 

Writing Component: The CTAB Persuasive Writing component assesses skills in, written 
communication, including making contact with a relevant audience, organizing a persuasive message that 
develops a number of relevant ideas, and using language to present ideas clearly and effectively. 



New Jersey College Basic Skills Placement Test (1978) 

Essay 

Publisher: State of New Jersey 

Writing Component: The NJBSPT is used to determine which students admitted to college need 
remedial instruction in basic skill areas in order to successfully complete college programs. Students are 
required to write unified paragraphs, organize their ideas, develop a logical argument, provide specific 

examples, use complete sentences with correct spelling, maintain a consistent tone, and express ideas 
precisely. 
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